Large scale simulations of astrophysical turbulence Axel Brandenburg
Large scale simulations of astrophysical turbulence Axel Brandenburg (Nordita, Copenhagen) Wolfgang Dobler (Univ. Calgary) Anders Johansen (MPIA, Heidelberg) Antony Mee (Univ. Newcastle) Nils Haugen (NTNU, Trondheim) etc. (. . . just google for Pencil Code)
Pencil Code • • Started in Sept. 2001 with Wolfgang Dobler High order (6 th order in space, 3 rd order in time) Cache & memory efficient MPI, can run Pacx. MPI (across countries!) Maintained/developed by many people (CVS!) Automatic validation (over night or any time) Max resolution so far 10243 , 256 procs 2
Pencil formulation • In CRAY days: worked with full chunks f(nx, ny, nz, nvar) – Now, on SGI, nearly 100% cache misses • Instead work with f(nx, nvar), i. e. one nx-pencil • No cache misses, negligible work space, just 2 N – Can keep all components of derivative tensors • Communication before sub-timestep • Then evaluate all derivatives, e. g. call curl(f, i. A, B) – Vector potential A=f(: , : , i. Ax: i. Az), B=B(nx, 3) 3
Switch modules • • magnetic or nomagnetic (e. g. just hydro) hydro or nohydro (e. g. kinematic dynamo) density or nodensity (burgulence) entropy or noentropy (e. g. isothermal) radiation or noradiation (solar convection, discs) dustvelocity or nodustvelocity (planetesimals) Coagulation, reaction equations Homochirality (reaction-diffusion-advection equations) Other physics modules: MHD, radiation, partial ionization, chemical reactions, selfgravity 4
Pencil Code check-ins 5
High-order schemes • Alternative to spectral or compact schemes – Efficiently parallelized, no transpose necessary – No restriction on boundary conditions – Curvilinear coordinates possible (except for singularities) • 6 th order central differences in space • Non-conservative scheme – Allows use of logarithmic density and entropy – Copes well with strong stratification and temperature contrasts 6
(i) High-order spatial schemes Main advantage: low phase errors 7
Wavenumber characteristics 8
Higher order – less viscosity 9
Less viscosity – also in shocks 10
(ii) High-order temporal schemes Main advantage: low amplitude errors 2 N-RK 3 scheme (Williamson 1980) 2 nd order 3 rd order 1 st order 11
Shock tube test 12
Haugen & Brandenburg (PRE, astro-ph/0402301) Hyperviscous, Smagorinsky, normal height of bottleneck increased onset of bottleneck at same position Inertial range unaffected by artificial diffusion 13
256 processor run at 10243 14
MHD equations Magn. Vector potential Induction Equation: Momentum and Continuity eqns 15
Vector potential • B=curl. A, advantage: div. B=0 • J=curl. B=curl(curl. A) =curl 2 A • Not a disadvantage: consider Alfven waves B-formulation A-formulation 2 nd der once is better than 1 st der twice! 16
Comparison of A and B methods 17
Wallclock time versus processor # nearly linear Scaling 100 Mb/s shows limitations 1 - 10 Gb/s no limitation 18
Sensitivity to layout on Linux clusters yprox x zproc 4 x 32 1 (speed) 8 x 16 3 times slower 16 x 8 17 times slower Gigabit uplink 100 Mbit link only 24 procs per hub 19
Why this sensitivity to layout? 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 All processors need to communicate with processors outside to group of 24 20
Use exactly 4 columns Only 2 x 4 = 8 processors need to communicate outside the group of 24 optimal use of speed ratio between 100 Mb ethernet switch and 1 Gb uplink 0 4 8 12 16 20 0 4 8 1 5 9 13 17 21 1 5 9 2 6 10 14 18 22 2 6 10 3 7 11 15 19 23 3 7 11 21
Fragmentation over many switches 22
Pre-processed data for animations 23
Ma=10 supersonic turbulence 24
Animation of B vectors 25
Animation of energy spectra Very long run at 5123 resolution 26
MRI turbulence MRI = magnetorotational instability 2563 w/o hypervisc. t = 600 = 20 orbits 5123 w/o hypervisc. Dt = 60 = 2 orbits 27
Fully convective star 28
Geodynamo simulation 29
Homochirality: competition of left/right Reaction-diffusion equation 30
Conclusions • Subgrid scale modeling can be unsafe (some problems) – shallower spectra, longer time scales, different saturation amplitudes (in helical dynamos) • High order schemes – Low phase and amplitude errors – Need less viscosity • • 100 MB link close to bandwidth limit Comparable to and now faster than Origin 2 x faster with GB switch 100 MB switches with GB uplink +/- optimal 31
Transfer equation & parallelization Analytic Solution: Processors Intrinsic Calculation Ray direction 32
The Transfer Equation & Parallelization Analytic Solution: Processors Communication Ray direction 33
The Transfer Equation & Parallelization Analytic Solution: Processors Intrinsic Calculation Ray direction 34
Current implementation • Plasma composed of H and He • Only hydrogen ionization • Only H- opacity, calculated analytically No need for look-up tables • Ray directions determined by grid geometry No interpolation is needed 35
Convection with radiation 36
- Slides: 36