Supercomputing the past present and future Ben Moore

Supercomputing: the past, present and future Ben Moore Institute for Theoretical Physics, University of Zurich + Oscar Agertz, Juerg Diemand, Tobias Kaufmann, Michela Mapelli, Joachim Stadel, Romain Teyssier & Marcel Zemp

Scaling of cpus, parallel computers and algorithms:

Scaling of cpus, parallel computers and algorithms: • Now here, not much increase in cpu speed or transistors, but higher number of operations per clock cycle

1957: IBM Vacuum tube, 5000 flops 2007: Intel Core 2 Duo, 10 billion flops Mare Nostrum: ~100 teraflops Seti@home is running at ~300 teraflops Top 500 #1, (2007) IBM Blue. Gene, 212, 992 processors, 478 teraflops Folding@home, Seti@home more than a petaflop (10^15 flops) The first N-body calculations with N^2 algorithms in 1960 used 30 particles Today we can now follow 10^10 particles and the resolution has increased ~ mean interparticle separation = 1000 times higher. Moore’s law + N^2 would allow us to do ~10^6 particles, we are orders of magnitude above this thanks to algorithmic development. Grid & treecodes are order (0)NLog(N) Dehnen’s hierarchical mesh code is order (0)N Stadel’s multipole expansion code PKDGRAV is also order (0)N and parallel.

“Supercomputing” Past: one FLOP per second in 1941

The N-body technique Solving these equations using finite difference techniques is impractical. It’s a 6 -D equation… Analytic solutions are known mainly for a few equilibrium systems. Solve equations of motion of N-Bodies…

The equivalence of light and gravity was a beautiful idea Here are Holmberg’s initial conditions. (Hmmmm, initial conditions, that’s a whole separate talk. )

Retrograde Prograde

This experiment was recently repeated by John Dubinski – note how Holmberg cheated with the second ring of six particles!

Example 1: The secular and dynamical evolution of disks A physically simple, but computationally demanding problem. N-body simulations have long been used to study galactic dynamics. But only in the last few years has the resolution been sufficient to suppress numerical heating of disks. Still, we are forced to construct “idealised models” using quasiequilibrium spherical structures. The disk heating problem has still not been properly solved…self consistently

A typical disk galaxy simulation with < 10^6 star particles. Look at those smooth disks with 10^8 stars! Spiral patterns need perturbations! (John Dubinski)

So, how do those CDM substructures perturb the disk? These simulations are from John Dubinski who attempts to follow all the most massive perturbers. See also Kazantzidis et al (2007). CDM substructure provides a nice means to generate spiral patterns and thick disks

Malin 1 The Origin of Giant Low Surface Brightness Galaxies Malin 2 UGC 6614 Mapelli et al (2007)

Moore & Parker 2007 Malin 1 has hardly had time to rotate once! How can stars form ~100 kpc from the center?

The Cartwheel Ring Galaxy

T=100 Myr=Cartwheel 200 kpc T=1 Gyr=Malin 1

Surface brightness profiles

Example 2: The non-linear formation of dark matter structures. A physically simple, but computationally highly demanding problem. • First cold collapse simulations in the 1980’s – gave “Einasto” profiles. • CDM hierarchical collapses in the 1990’s – gave “Einasto” profiles. There are lots of empirical “universal” correlations but we don’t understand their physical origin: density and phase space with radius relations, betagamma relation… Observations of galaxies and dwarf spheroidals tell us about the mass distribution on ~100 parsec scales. Simulations have yet to reach that. Until today… CDM hierarchical collapses in 2007 still smooth in the inner Galactic region. Do we have dark satellites at 8 kpc? Indirect detection experiments need the phase space distribution of particles flowing throw this room. In fact, this is impossible to determine computationally. But we can make some bold extrapolations…

30 years ago…. N=700 (White 1976) This overmerging problem was part of the motivation for White & Rees (1978) and the beginning of semianalytic galaxy formation models.

Historically the first n-body simulations were carried out to study the collapse of a cloud of stars. (van Albada 1968) With 100 stars these authors found structures similar to elliptical galaxies. However very cold collapses lead to a radial orbit instability and the system forms a prolate bar. Numerical simulations show that k. e. /p. e. >0. 2 in order to suppress this instability.

The origin of universal density profiles, universal phase space density profiles, universal velocity distribution functions, universal correlations between anisotropy and local density slope – are unknown. T=10 crossing times T=0

Increasing resolution doesnt just give you more of the same – you find a new Physical understanding Cluster Resolved (1980 -1997) Galaxy Halos Resoved (1998) 67, 500 1, 300, 000 • It took Joachim 3 years to write the first version of the mpi treecode PKDGRAV • It took six months to do this simulation on the 128 processor KSR • It finished in 1996 and we stared at it for a year before developing the tools to analyse it (Ghigna etal 1998).

The first structures in the universe Diemand etal (2005) earth mass, parsec scale Each particle has the mass of the moon…higher resolution than groups simulating the origin of the solar system.

Diemand etal (2005) This is the one of the first objects to form in the Universe at z=60. The halo is smooth, has a cuspy density profile, has an earth mass 10^-6 Mo and a size of the solar system. Can be detected as high proper motion gamma-ray sources (Moore et al 2005).

Stable for a long time, but no structure is stable forever… see talk of Krauss Agreement with galaxy properties/abun dance/radial distribution in clusters is very strong evidence for a hierarchical universe

Confrontation of rotation curves with predictions from numerical simulations De. Blok, Mc. Gaugh, Rubin & Bosma (2002) Swaters, Madore, van den Bosch & Balcells (2002) Bars heat core? Not all galaxies are barred i. e. M 33. Bars are long lived. Triaxility? CDM haloes are round, should observe 50% galaxies with steeper cusps.

Goerdt et al 2006 • Dynamical friction fails in constant density systems • Reason is that dm/stars have the same orbital frequency – all particles that can scatter off the sinking mass do so immediately • Consequence is rapid sinking and then stalling • Even if halo is slightly cuspy, will result in a core and stalling

Example 3: The internal kinematics of the ISM A physically complex problem that has remained unsolved for decades. What determines the kinematics of the ISM and regulates starformation? Supernovae? Large scale gravity modes? Molecular Cloud – Molecular Cloud encounters?

Dibb & Burkert: “Gravity alone can’t be responsible for the floor in observed ISM dispersions” “Need energy input from supernovae” Previous unresolved ISM simulations

AMR SPH Oscar Agertz et al (2007)

Oscar Agertz et al (2007) “mind the gap”

Agertz Mayer Lake Moore Teyssier ISM kinematics using the RAMSES AMR code +/- star formation +/- SN feedback

Cold Warm Hot

Agertz et al ( 2008) “The observed dispersion velocity is generated through molecular cloud formation and subsequent gravitational encounters”

Scaling of cpus, parallel computers and algorithms: Given the exponential scaling of supercompters, it is difficult for special purpose hardware to keep up at a reasonable cost. Also want a general purpose code that can scale to 10000’s of processors.

Scaling of cpus, parallel computers and algorithms: It’s a tough commercial business! ksr Clusters started to appear ~2000, but you couldn’t buy a system that wouldn’t melt down. So we built our own…. z. Box 1

Parallel supercomputing

z. Box 2 (2006) – 5 Tflops 500 quad Opteron 852’s, 580 Gb memory, 65 Tb disk, 3 d-SCI low latency network. z. Box 1: (Stadel & Moore 2002) 288 AMD MP 2200+ processors, 144 Gigs ram Compact, easy to cool and maintain Very fast Dolphin/SCI interconnects - 4 Gbit/s, microsecond latency A teraflop supercomputer for $500, 000 Roughly one cubic meter, one ton and requires 40 kilowatts of power 2002 0. 5 Tflops 2007 6 Tflops, 200 Tb disk

• With fixed timesteps most codes all scale very well. • However, this is no-longer the only measure since the scaling of a very "deep" multistepping run is always a lot worse. • How do we do multistepping now and why does it still have problems?

Choice of Timestep • Want a criterion that commutes with the Kick operator and is Galilean invariant, so it should not depend on velocities. Local Non-local, based on max acceleration in moderate densities and can take the minimum of any or all of these criteria

Zemp et al (2007) S = 0. 2 D = 0. 03 Here we show the timesteps versus radius, for a sperical NFW halo Dynamical Time Wasted CPU time Standard Innacurate timesteps

Sometimes nearby particles are important, as in this restricted three body example:

This two body mildly eccentric orbit can’t be followed with the standard leap-frog integrator. The orbits shrink and precess.

But look at this highly eccentric, unequal mass orbit of a star around a black hole using the error correction: (Note the axis are not the same scale) Marcel Zemp: a collisionless and collisional MPI parallel code. Can simulate central BH kinematics within a CDM halo for example. Or the core collapse of a GC.

Pkdgrav 2006

Multistepping: The real parallel computing challenge. • T ~ 1/sqrt(Gρ), even more dramatic in SPH • Implies Nactive << N • Global approach to load balancing fails. • Less compute/comm • Too many synchronization points between all processors. Want all algorithms of the simulation code to scale as O(Nactive log N)! Everything that isn't introduces a fixed cost which limits the speed-up attainable from multistepping

The Future In seven years our laptops will have a teraflop of processing power, equivalent to our first z. Box 1 supercomputer! What is the performance of the ultimate laptop?

Information theory + Quantum mechanics + Thermodynamics Seth Loyd, Nature (2000)

Extrapolating the exponential Moore’s law into the future we find that it will take just 250 years to achieve those 40 orders of magnitude gain.

Extrapolating the exponential Moore’s law into the future we find that it will take just 250 years to achieve those 40 orders of magnitude gain. …and you still won’t be able to simulate a CDM halo with the correct number of particles!