Parallelization of Monte Carlo simulations of spallation experiments

Slides: 1

Parallelization of Monte Carlo simulations of spallation experiments Mitja Majerle 1*, Josef Hampl 1 Accelerator Driven Systems *) Electronic address: majerle@ujf. cas. cz 1) Nuclear Physics Institute of the Academy of Sciences of the Czech Republic, Prague Spallation experiments at the Joint Institute for Nuclear Research in Dubna ADTT (Accelerator Driven Transmutation Technologies, or ADS – Accelerator Driven Systems) are considered to be a promising option for future nuclear energy production and nuclear waste incineration. ADS produce a large number of neutrons in the spallation process, and introduce them into a sub-critical reactor assembly, where they produce energy by fission of fissionable material. Extra neutrons are used to produce fuel from 232 Th, and/or to transmute longlived nuclear waste to stable isotopes, or to short-lived isotopes which decay to stable ones. History: The experiments with simplified ADS setups were performed on accelerators at the JINR (Joint Institute for Nuclear Research, Dubna, Russia). The targets were lead, tungsten, and uranium cylinders (in some experiments surrounded with paraffin moderator or natural U blanket). The physical aim of the experiments was to study nuclear processes in the target, to inquire into the process of transmutation in radioactive samples, to measure the heat production, etc. The neutronics of the experimental setups is studied by several types of small detectors placed around and inside the setup. Two examples of this work are the setups Energy plus Transmutation and Gamma-MD. ADS are so far in the early experimental and design stage. Before the construction of the first full scale ADS unit, a lot of research in this field has to be done. Details of the spallation reaction and its description in simulation codes, as well as the cross-sections of wide range of nuclear reactions are being studied worldwide. Energy plus Transmutation ANITA neutron facility Energy plus Transmutation is an international project that studies the neutron production and radioactive isotope transmutation in a system of massive lead target and natural uranium blanket. Lead target is in the form of a cylinder, which is half a meter long and has 8 cm in diameter. It is surrounded with a natural uranium blanket (total weight of the uranium is 206 kg), which is in the form of small rods coated with aluminum. The setup is placed in a biological shielding consisting of a wooden box filled with granulated polyethylene, inner walls of the shielding are coated with cadmium. This setup was irradiated in several experiments by protons and deuterons in the energy range 0. 7 -2. 52 Ge. V. ANITA (Atmospheric-like Neutrons from th. Ick TArget) is a neutron facility for accelerated testing of electronics for neutron-induced single-event effects (SEE). SEE cause one of the major reliability concerns in semiconductor electronic components and systems. The facility utilizes accelerated proton beam (0. 18 Ge. V), guided to a tungsten target. The neutron beam, originating from nuclear reactions induced by protons in tungsten, is formed geometrically by a collimator aperture in a frontal wall that separates the neutron production cave from the user area. Gamma-MD The features of the facility include high neutron flux, fidelity of the neutron spectrum with regard to the natural environment, flexible size of the beam spot, user flux control, low dose rate from g-rays, low thermal neutron flux, spacious user area, and on-line neutron dosimetry. Gamma series of experiments are focused to the studies of production of neutrons in the spallation process and their moderation/transport in the neutron moderator. The first such experiment (Gamma-2) was performed on 20 cm long lead target surrounded with 6 cm thick paraffin moderator. 660 Me. V protons were directed to the target. The facility, installed at The Svedberg Laboratory, Uppsala, Sweden, is frequently used by leading manufacturers of semiconductor electronic components and systems. Further information about the facility and the laboratory can be found at www. tsl. uu. se. Calculation principles Parallelization of simulations MCNPX Important part of every experiment preparation and analysis is setup simulation. The described systems are implemented in the Monte Carlo codes. Nuclear interactions and subsequent particle transport are simulated and several quantities are sampled, such as number of produced secondary particles, fluxes at different places around setup … In the MCNPX code version 2. 4. 0 the parallelization through PVM (Parallel Virtual Machine) was implemented. PVM version of MCNPX 2. 4. 0 was running successfully on our 2 computing servers (each had two cores - Xeon 3 GHz), it was also used on Beowulf-like cluster connecting the personal computers from our offices on weekends. The release of MCNPX 2. 5. 0 code had implemented MPI (Message Passing Interface), PVM was deprecated. MPICH 2 flavor of MPI was running on our Xeon servers and Beowulf-like cluster. In 2007, we bought 2 computing servers, each with four 64 -bit cores (Opteron 280, 1. 8 GHz). We moved to OPENMPI flavor of MPI (MPICH 2 did not work), which was easier and more reliable to use. Two different codes are used - MCNPX and FLUKA – and the time to simulate one event is usually from 10 -4 - 0. 1 s for described setups, depending on the complexity, used calculation models and libraries. The sampled phase space is also very small and no variance reduction techniques can be used in most cases, therefore, statistically reliable results starts with ca. 107 simulated events. It was essential to introduce parallelization in the simulations to get results in reasonable time. COMPUTING RESOURCES ON SPECTROSCOPY DEPARTMENT Extended target of lead (diameter 8 cm, length 60 cm) is used in the setup Gamma. MD. The moderator is a block of graphite (1. 1 x 0. 6 m 3). This setup was by now irradiated with 2. 33 Ge. V deuterons, more irradiations are planned. NPI 2004: -2 two core servers = 4*Xeon 3 GHz - MCNPX with PVM, later MPICH 2 - Beowulf-like cluster from office PCs - MCNPX with PVM, later MPICH 2 2007: - 2 four core servers = 8*Opteron 280 1. 8 GHz - 64 -bit MCNPX with OPENMPI - FLUKA - Access to METACENTRUM computing facilities 2008: - 8 eight core servers = 64*Intel Xeon E 5420 2. 5 GHz - 64 -bit MCNPX with OPENMPI - FLUKA with TORQUE PBS From the end of 2008, we have 8 computing servers, each with 8 cores (Xeon E 5420, 2. 5 GHz) = 64 cores. OPENMPI with the latest beta version of MCNPX is running on it. FLUKA Setup length is ca 0. 5 m. Particle fluxes are sampled on cm scale (the size of used detectors) with the energy binning 1 Mev in the range 0 -few Ge. V. FLUKA code does not come with any implementation of parallelization, but is designed in the way to be easily used in PBS (Portable Batch System) environments. The tools to combine the results from independent runs are available with the source code (generation of starting random seeds for independent runs ? ). We started to use FLUKA in parallel on CESNET Meta. Centrum computers in PBSPro environment. On our 8 servers, we installed TORQUE PBS environment, which is a free alternative to PBSPro. BEOWULF-LIKE CLUSTER -During weekends, office PCs were used for calculations -Up to 6 Pentium 4 computers were available -The same OS as on servers was installed on 1 computer -Others computers booted through Etherboot -Shared user file system mounted through NFS (Network File System), all programs compiled on Xeon servers work (same OS and architecture) -Speed was roughly the same as of four Xeon processors: -Total CPU power is higher, but … -Loss because of load balancing -Slower I/O buses -Inspiration and help for the cluster came from the documentation of “ECHELON BEOWULF CLUSTER” METACENTRUM (http: //meta. cesnet. cz) Comparison of MCNPX calculation performance YEAR 2004 2007 2009 COMPUTER 2 two core servers Beowulf-like cluster 2 four core servers METACENTRUM 8 eight core servers SPEED 4 4 11 80 100 [k. Spec. INT 2 k] -8 cores for 2 hours, queue “short” -32 cores for 24 hours in queue “normal” -40 cores for 1 month, queue “long”, but rarely free -Very valuable are first experiences in PBS environment: -Running FLUKA in PBS environment -TORQUE PBS at NPI installed based on these experiences -Tests of different processors available on METACENTRUM computers: -Processors for 8 servers at NPI bought in 2008 were chosen according to these tests TESTS MPI TESTS -OPENMPI on 8 machines with 8 cores was tested -MCNPX is the only program that uses MPI -12 different cores available from -Results: -The effectiveness of the cluster drops with the number of cores METACENTRUM and NPI were tested used (communication between cores). -Tests are adapted for core (not disk/bus) -The computational power of our computers increased for a performance: factor of 25 from 2004, which is higher than Moore’s law -POVRAY rendering prediction of 5. 6. -Movie encoding with MENCODER -FLUKA simulation of physical system -MCNPX simulation of physical system -Results: -From 2004 to 2009, the core performance increased for a factor of ~2, not 5. 6 according to Moore’s law (2 x every two years). -The first three tests are consistent: -POVRAY, MPLAYER and FLUKA compiled with GCC, G 77 compilers; -Last test (MCNPX) gives different values: -MCNPX is compiled with Intel compilers -Intel compilers are tuned for Intel processors -MCNPX compiled with Portland Group compilers is ~2 times slower than MCNPX compiled with Intel compilers PROCESSOR TESTS 1 k. SPECINT 2 k ~ Xeon 3. 06 GHz