Separation of scales with fast rotation and weak

  • Slides: 46
Download presentation
Separation of scales with fast rotation and weak stratification and some ideas for exascale

Separation of scales with fast rotation and weak stratification and some ideas for exascale computing Beth Wingate - Los Alamos National Laboratory Supported by DOE ASCR & BER

Outline o Mesoscale dynamics in the ocean – deformation radius - Ocean grid spacing

Outline o Mesoscale dynamics in the ocean – deformation radius - Ocean grid spacing relative to the deformation radius for eddy resolving versus climate scales - Charney and the separation of time scales - Time separation in today’s hydrostatic ocean codes o A new result for strong rotation and weak stratification - New nonhydrostatic slow equations - Some numerical results - Some observations o How are these ideas related to next generation ocean models?

We’d like to resolve the mesoscale features of the ocean o What is mesoscale

We’d like to resolve the mesoscale features of the ocean o What is mesoscale dynamics? Climate simulations • low resolution: 1 deg (100 km) • long duration: 100 s of years • fully coupled to atmosphere, etc. Eddy-resolving sim. • high resolution: 0. 1 deg (10 km) • short duration: 50 -100 years • ocean only Rossby Radius of deformation Potential temperature 0. 8º x 0. 8º grid Potential temperature 0. 1º x 0. 1º grid

The baroclinic instability occurs near the deformation radius Kinetic energy Surface temperature POP POP

The baroclinic instability occurs near the deformation radius Kinetic energy Surface temperature POP POP resolution: 0. 8 (low) 0. 4 low resolution-0. 8º standard POP 0. 2 (high) Eddy Kinetic energy high resolution-0. 1º standard POP POP Potential temperature - vertical cross section POP resolution: 0. 8 (low) 0. 4 0. 2 (high) Depth of 6 C isotherm depth high res. low resolution-0. 8º standard POP S high resolution-0. 1º standard POP N S med. res. N low res.

The Rossby Deformation Radius o The length scale where the earth’s tendency to deform

The Rossby Deformation Radius o The length scale where the earth’s tendency to deform the ocean is balanced by gravity’s tendency to flatten it. The length scale where a current ‘deforms’ due to rotation. o Shallow water o 3 D N is the buoyancy frequency, f is the rotation frequency, H a vertical scale height

Deformation radius versus grid spacing Smith, Maltrud, Bryan, Hecht, 2000

Deformation radius versus grid spacing Smith, Maltrud, Bryan, Hecht, 2000

Separation of Time Scales and Charney o By using measurements of typical length and

Separation of Time Scales and Charney o By using measurements of typical length and time scales Charney, based on a paper he published about stability the year before, derived the Quasi-Geostrophic equations which, “Filtered out the fast waves” (Charney, Geof Publ, 1948) o He used geostrophic balance and hydrostatic balance. o Extensively used to understand the large scale features of rotating and stratified flow including explaining baroclinic instability. It also is the source of understanding Rossby waves (planetary scale waves). o Important conceptual model

Motivation: IPCC class ocean models – a matter of convergence under grid refinement Boussinesq

Motivation: IPCC class ocean models – a matter of convergence under grid refinement Boussinesq equations and the hydrostatic primitive equations But if you do not use the hydrostatic approximation you add back in the waves due to dw/dt. State of the art IPCC class OGCMs • In Models the flow is Hydrostatic – valid when horizontal grid spacing is much larger than vertical • Important to resolve Deformation radius – gets smaller with latitude • The higher the resolution, the harder it is to meet both these criteria.

Hydrostatic equations on parallel architectures 1. Start with U(t) (slow+fast) 2. Subtract 2 D

Hydrostatic equations on parallel architectures 1. Start with U(t) (slow+fast) 2. Subtract 2 D fast component a. Use vertical integration 3. Advance 2 D fast dynamics to t+dt_slow a) Subcycle by performing many time steps at dt_fast, or b) Use a semi-implicit method 4. Advance 3 D slow dynamics to t+dt_slow a) Use explicit time stepping 5. Combine fast and slow dynamics to obtain U(t+dt_slow)

Dynamics at high latitudes o Nearly solid body rotation o Stratified flow, but weaker

Dynamics at high latitudes o Nearly solid body rotation o Stratified flow, but weaker than stratification elsewhere. Timmermans et al 2010 Image courtesy NOAA, 2007

Low Rossby limiting dynamics in weak stratification Beth Wingate, Pedro Embid, Miranda Holmes-Cerfon, Mark

Low Rossby limiting dynamics in weak stratification Beth Wingate, Pedro Embid, Miranda Holmes-Cerfon, Mark Taylor Triply periodic rotating and stratified Boussinesq equations Embid and Majda, 1996, 1998 Majda and Embid, 1998 Schochet, 1994 Klainerman and Majda 1981

Consider the linear waves of this system Buoyancy Frequency: Rotation Frequency: Rossby Deformation Radius:

Consider the linear waves of this system Buoyancy Frequency: Rotation Frequency: Rossby Deformation Radius: Slow:

Nondimensional Boussinesq Equations

Nondimensional Boussinesq Equations

Conserved Quantities: Total Energy and Potential Enstrophy In the absence of viscosity and diffusivity:

Conserved Quantities: Total Energy and Potential Enstrophy In the absence of viscosity and diffusivity: where

Nonlocal form in a Hilbert Space Embid and Majda, 1996, 1997 Schochet, 1994 Klainerman

Nonlocal form in a Hilbert Space Embid and Majda, 1996, 1997 Schochet, 1994 Klainerman and Majda 1981 2 Hilbert Space X of vector fields u in L that are 2 divergence free and equipped with the L norm.

Method of Multiple Scales To avoid secularity the second order term must be smaller

Method of Multiple Scales To avoid secularity the second order term must be smaller than the leading order term.

Abstract form with slow and fast time scales To lowest order

Abstract form with slow and fast time scales To lowest order

Solve for the fast leading order solutions The order 1 solution is a function

Solve for the fast leading order solutions The order 1 solution is a function of the leading order solution: Solve with Duhammel’s formula and ensure the u 1 solution is not large by using the concept of fast wave averaging. Where solves:

By direct computation here is the fast wave averaged equation: Compute the evolution equation

By direct computation here is the fast wave averaged equation: Compute the evolution equation for the Fourier amplitudes

The eigenvalues and eigenfunctions of the leading order equations Look at solutions of the

The eigenvalues and eigenfunctions of the leading order equations Look at solutions of the linear problem: 0 gyroscopic/inertial waves Notice that there is a dispersive mode with zero frequency.

The slow dynamics evolves independently of the fast Three wave resonances means we must

The slow dynamics evolves independently of the fast Three wave resonances means we must choose k’ and k’’ such that You can show that the interaction coefficients are zero for the fast-fast interaction. Which means that the slow dynamics evolves independently of the fast.

The new slow equations challenge our ideas of fast and slow dynamics in the

The new slow equations challenge our ideas of fast and slow dynamics in the ocean. They are nonhydrostatic. 2 -D 3 -D 2 -D

Taylor-Proudman Theory Taylor Proudman theorem: In rapidly rotating flow, the flow two-dimensionalizes.

Taylor-Proudman Theory Taylor Proudman theorem: In rapidly rotating flow, the flow two-dimensionalizes.

Conservation of the ratio of slow to fast energy: Using the same arguments as

Conservation of the ratio of slow to fast energy: Using the same arguments as Embid and Majda, 1997, there is conservation in time of the slow to fast energy ratio. This means the total energy conservation is composed of both slow and fast dynamics. But the leading order potential enstrophy is composed only of the slow dynamics.

High resolution numerical simulations White noise forcing. Smith and Waleffe, JFM, 2002 Hyperviscosity: Random

High resolution numerical simulations White noise forcing. Smith and Waleffe, JFM, 2002 Hyperviscosity: Random Forcing:

Ro =. 2 White noise forcing wave number 3 Ro = 1 “Low Rossby

Ro =. 2 White noise forcing wave number 3 Ro = 1 “Low Rossby limiting dynamics for stably stratified flows”, B. Wingate, P. Embid, M. Holmes-Cerfon, M. Taylor, To Appear Journal of Fluid Mechanics, 2010

Ratio of Slow to Fast Potential Enstrophy

Ratio of Slow to Fast Potential Enstrophy

Dependence on Rotation Rate

Dependence on Rotation Rate

Observations in the Arctic Basin Mary Louise Timmermans, Yale University Beaufort Gyre, Site B

Observations in the Arctic Basin Mary Louise Timmermans, Yale University Beaufort Gyre, Site B Funded by the Beaufort Gyre Exploration Program - National Science Foundation

Typical Oceanic Conditions at site B

Typical Oceanic Conditions at site B

Mooring data: - deep mesoscale vortices depth-time section of potential temperature with isopycnals overlain

Mooring data: - deep mesoscale vortices depth-time section of potential temperature with isopycnals overlain Theme Area? depth-time section of speed 31

What I’m thinking about for nonhydrostatic ocean and exascale computing What is fast in

What I’m thinking about for nonhydrostatic ocean and exascale computing What is fast in most of the ocean (hydrostatic) is slow in areas of weak stratification. Meaning the ocean itself wanders in and out of the parameter space, touching regions where there are separations of time scales but not necessarily QG. If we are to solve the nonhydrostatic equations for the SLOW mesoscale but step over the fast waves, how do we do it?

DOE POV: Power is limiting factor 200 MW usual scaling goal 2005 2010 2015

DOE POV: Power is limiting factor 200 MW usual scaling goal 2005 2010 2015 2020 A city of 80, 000 people is estimated to use 45 MW of power per year. Exascale design is for 1 exaflop to cost 20 MW instead of 200 MW.

DOE POV: Off chip data movement costs more than FLOPS Intranode SMP 10000 Intranode

DOE POV: Off chip data movement costs more than FLOPS Intranode SMP 10000 Intranode MPI Flop 100 now 2018 10 em ec t sy st ro ss C te rc in lo ca l O ff- ch ip / D R on n AM p on m 5 m 1 m m on -c hi p -c hi er eg is t R P FL O P 1 D Pico. Joules 1000

Scientist POV: My life as a Co-Designer Example of the programming model for using

Scientist POV: My life as a Co-Designer Example of the programming model for using CPU + GPU CPU GPU Chipset DRAM PCIexpress bus HOST DEVICE Data parallel pieces of an algorithm are executed on the device as KERNELS that are executed as many THREADS grouped together in BLOCKS grouped together in GRIDS. Each concept has hardware meaning.

Scientist POV: My life as a Co-Designer o The key to getting performance through

Scientist POV: My life as a Co-Designer o The key to getting performance through the use of coprocessors (such as the GPU) is the idea of Data-Parallel Computing o Identical operations executed on many data elements in parallel (simplified flow control allows increased ratio of compute logic to control logic) a = 2 * pi + x b=y*z DATA INDEPENDENT a = 2 * pi + x b=y*a DATA DEPENDENT

Scientist POV: My life as a Co-Designer: Example of the programming model for using

Scientist POV: My life as a Co-Designer: Example of the programming model for using CPU + GPU main program { …… Logic to execute ocean model dynamics …… Call to ship data in CPU memory to GPU memory Column. Physics. Kernal<<<dim 3 Grid. Size, dim 3 Block. Size>>>(…) …… Call to ship GPU memory back to CPU memory } __global__ Column. Physics. Kernal(float *x, int *b, …) { Int id = block. IDx. x*block. DIM. x+thread. ID. x A(id) = sin(x(id)) }

Scientist POV: My life as a Co-Designer My general remarks o Granularity – measure

Scientist POV: My life as a Co-Designer My general remarks o Granularity – measure of the ratio of computation in each task to communication in each task o Memory – the device OS is very light weight, but there are different kinds of memory (GLOBAL, LOCAL, CONSTANT, CACHE, TEXTURE) that can be used to optimize performance (you are the compiler). For example TEXTURE MEMORY allows hardware support for operations like filtering – meaning it is Very Fast. (Can we design special chips for on-board elliptic solves or ffts? YES!) o The use of asynchronous data use can be powerful In science, though, sometimes we need to make sure we have all the data before proceeding – then use __synchthreads() o Already exists CUBLAS, CUFFT, CUSPARSE, CURAND (NVIDIA is sending people from its new HPC methods group to LANL) o Need good programming models to optimize: 1. Parallel efficiency (latency and execution, arithmetic optimization and warps). Think about global memory bandwidth to latency. 2. Memory Optimization

Scientist POV: My life as a Co-Designer More general remarks o On a heterogeneous

Scientist POV: My life as a Co-Designer More general remarks o On a heterogeneous machine we can have hardware that perform specific operations. We need to ‘reach through’ all the way to chip designers to do this. o Considering the original problem from the math/algorithms point of view through the lens of heterogeneous computing 1. Our complex climate codes will no longer be dominated by one numerical method. We may use spectral elements for something, finite volumes for another, we may need to use monte carlo to solve elliptic problems. We may need to throw our current algorithms such as the barotropic/baroclinic splitting away entirely and throw compute cycles at the problem with high order methods. We may need to consider if there is a part of the dynamics that can be considered stochastic. 2. We already do this in the climate model with different methods in the atmosphere, ocean, ice sheets, sea ice. Why not take this further?

Hydrostatic and parallel versus Nonhydrostatic and heterogeneous Current solution method for HPE 1. Start

Hydrostatic and parallel versus Nonhydrostatic and heterogeneous Current solution method for HPE 1. Start with U(t) (slow+fast) 2. Subtract 2 D fast component a. Use vertical integration 3. Advance 2 D fast dynamics to t+dt_slow a) Subcycle by performing many time steps at dt_fast, or b) Use a semi-implicit method 4. Advance 3 D slow dynamics to t+dt_slow a) Use explicit time stepping 5. Combine fast and slow dynamics to obtain U(t+dt_slow) New solution method for NHBE 1. Start with U(t) (slow+fast) 2. Compute 3 D slow component (fast-wave (time) averaging & projection) 3. Subtract 3 D slow component 4. Advance 3 D fast dynamics to t+dt_slow a) Brute force explicit sub cycling, or b) Novel projection/Monte-Carlo (as in Zwanzeig et al), or c) Novel projection/PIC on GPU, or d) Novel semi-implicit on GPU 5. Advance 3 D slow dynamics to t +dt_slow a) Use fast-wave averaging ideas and explicit time stepping 6. Combine fast and slow dynamics to obtain U(t+dt_slow)

Summary o Discussed the mesoscale in contemporary ocean models and our approximations. o Showed

Summary o Discussed the mesoscale in contemporary ocean models and our approximations. o Showed you a new result that suggests there are slow dynamics in weak stratification and they are nonhydrostatic. o Suggests the nonhydrostatic equations are necessary to predict the mesoscale dynamics. o Importance of conceptual models in the design of next generation numerical ocean models.

Then use the method of fast wave averaging Where solves: And therefore, the solution

Then use the method of fast wave averaging Where solves: And therefore, the solution to leading order is And since the fast linear operator is skew-Hermitian, has an orthogonal decomposition into fast and slow components: Using the fact that like, , the leading order solution looks

A key observation is that the fast operator, skew-Hermitian in X: . , is

A key observation is that the fast operator, skew-Hermitian in X: . , is By the Spectral Theorem, skew-Hermitian operators have purely imaginary eigenvalues and an orthonormal basis of eigenfunctions. Physically this means the basic normal mode solutions of the equations are wave solutions. The null space of , , is orthogonal to the range of because the null space is spanned by the eigenfunctions with zero eigenvalues (slow modes) where the range is spanned by the remaining eigenfunctions with non-zero eigenvalues (fast modes).

Derive the equation for the slow dynamics Knowing the slow dynamics evolves independently of

Derive the equation for the slow dynamics Knowing the slow dynamics evolves independently of the fast we can find the equations for the slow dynamics by projecting the solution and the equations onto the null space of the fast operator. Then the fast wave averaged equation for the slow modes becomes,

The null space of the fast operator where

The null space of the fast operator where

With Conservation Laws where:

With Conservation Laws where: