ASTERICS KM 3 Ne T 2 nd ASTERICSOBELICS
ASTERICS & KM 3 Ne. T 2 nd ASTERICS-OBELICS Workshop 16 -19 October 2017, Barcelona, Spain. H 2020 -Astronomy ESFRI and Research Infrastructure Cluster (Grant Agreement number: 653477). 10/10/2017 ASTERICS-OBELICS Workshop 2017 / Barcelona 1
ASTERICS & KM 3 Ne. T CORELib: A COsmic Ray Event LIBrary for Open Access (D-ANA) Bernardino Spisso, INFN 2
ASTERICS & KM 3 Ne. T Introductions Cosmic rays are a common background source for experiments in astroparticle physics and neutrino astronomy. The requirements of computing power needed to simulate air showers are heavily dependent on the energy window of interest, the simulated processes, the minimum energy of products and the inclination of the primaries. CORELib is a cosmic ray event library that is meant to be open to access to satisfy a broad range of needs. Although models are always changing and improving, there is a need for a reference dataset suitable also to develop and compare the performances of reconstruction and classification algorithms. The status of production is reviewed and the challenges in data sharing are discussed. Bernardino Spisso, INFN 2
ASTERICS & KM 3 Ne. T Computing centres and pools provide resources for the KM 3 Ne. T Tier Computing Facility Main Task Tier-0 at detector site online processing Tier-1 CC-IN 2 P 3 general offline processing and central data storage CNAF Tier-2 Access direct access, direct processing direct access, batch processing and grid access general offline processing grid access and central data storage general offline processing, Re. Ca. S grid access interim data storage Hellas. Grid reconstruction of data grid access direct access, HOU computing cluster simulation processing batch processing local computing simulation and analysis varying clusters Bernardino Spisso, INFN 3
ASTERICS & KM 3 Ne. T on the GRID VO Central Services Service Authentication/authorization system VOMS User Interface Logical File Catalog Job submission and management system (WMS) Site RECAS-NAPOLI, Hellas. Grid-Okeanos, CNAF, Frascati RECAS-NAPOLI Hellas. Grid-Afroditi • KM 3 Ne. T is starting on the GRID • Main task: CORELib(COsmic Ray Event Library) Bernardino Spisso, INFN 4
ASTERICS & KM 3 Ne. T CORELib • CORELib: COsmic Ray Event Library • • Background to many experiments Also a tuning benchmark Potentially useful to other communities Currently using CORSIKA as generator • Status of production • Proton-induced showers (1° delivery production): o HE models: QGSJET 01 with CHARM, QGSJET 01 with TAULEP, QGSJET-II with TAULEP, EPOSLHC with TAULEP o LE model: GHEISHA o about 21 M Evts per HE model o 7 energy bins (2× 102 Ge. V-103 Ge. V+equally logarithmically spaced from 1 Te. V to 109 Ge. V) o power-law spectrum with -2 spectral index o zenith angle from 0 to 89 degrees • Nuclei-induced showers: o HE models: QGSJET 01 with CHARM, QGSJET 01 with TAULEP, QGSJET-II with TAULEP, EPOS-LHC with TAULEP o LE model: GHEISHA o about 21 M Evts per HE model o 7 energy bins (A× 2× 102 Ge. V-A× 103 Ge. V+equally logarithmically spaced from A× 1 Te. V to A× 109 Ge. V) o power-law spectrum with -2 spectral index o zenith angle from 0 to 89 degrees Bernardino Spisso, INFN 6
ASTERICS & KM 3 Ne. T CORELib • CORELib: COsmic Ray Event Library Status of production Energy range (Ge. V) Number of events 200 -1000 107 103 -104 107 104 -105 106 105 -106 105 106 -107 104 107 -108 103 108 -109 102 About 21 M events per HE model (~1% of the total production foreseen by KM 3 Ne. T) High energy model Production done with and without Cherenkov radiation Low energy model Option TAULEP QGSJET 01 GHEISHA X QGSJETII-04 GHEISHA X EPOS LHC GHEISHA X Bernardino Spisso, INFN CHARM X 7
ASTERICS & KM 3 Ne. T CORELib CORSIKA - COsmic Ray SImulation for KAscade (Dieter Heck, Tanguy Pierog, Johannes Knapp et al. ) is a program for detailed simulation of extensive air showers initiated by high energy cosmic ray particles. Protons, light nuclei up to iron, photons, and many other particles may be treated as primaries. CORSIKA produces two different main output types: • Control output (text files) • Particle List (binary files) The DAT file contains the information on the shower secondary particles The CER file contains the photons produced by the Cherenkov effect Each event represents a different simulated shower Bernardino Spisso, INFN 7
ASTERICS & KM 3 Ne. T CORELib CORSIKA is a program based on Monte Carlo approach to study the evolution and the features of particle showers in the atmosphere. • Initially developed to run simulations for the KASCADE experiment located in Karlsruhe, Germany • Now CORSIKA can simulate a particle shower varying the atmosphere parameterization and the observation level In CORELib we chose the sea level as observation level and the standard European atmosphere. This choice makes it suitable for possible usage by other communities Notice: KM 3 Ne. T could ignore Cherenkov photons and near-horizontal muons, but we chose to include them as a service to the community Bernardino Spisso, INFN 7
ASTERICS & KM 3 Ne. T CORELib Some plots of the output showers varying the primary particle at the energy of 104 Ge. V Photon Proton Bernardino Spisso, INFN He 7
ASTERICS & KM 3 Ne. T Time estimation The following results have been calculated analysing the CORSIKA output files. The computation time is the difference between the last and the first “PRESENT TIME” UTC date in the standard output of CORSIKA, thus has an uncertainty of 2 seconds plus a (negligible? ) systematic shift due to the program & libraries load time. Bernardino Spisso, INFN 8
ASTERICS & KM 3 Ne. T Size estimation The event size has been calculated by dividing the total size of the file by the number of events. Both curves are linear in energy (exponential in logarithm of energy). Bernardino Spisso, INFN 8
ASTERICS & KM 3 Ne. T Ongoing production motivations Physics biasing: replaces the natural distribution of some process with “fake” PDFs that limit events to what is useful for your simulation Primary particle biasing (variance reduction): Increase number of primary particles generated in a particular phase space region of interest, PDFs of primary particle is appropriately modified Use case: Increase number of high energy particles in cosmic ray spectrum Bernardino Spisso, INFN 9
ASTERICS & KM 3 Ne. T Ongoing production Proton-induced showers: - New production on going using flat spectrum (estimated 32 X output size increase! ) - HEMODELS: QGSJET 01 with CHARM, QGSJET 01 with TAULEP, QGSJET-II with TAULEP, EPOSLHC with TAULEP - LEMODEL: GHEISHA - about 15 M Evts per HE model - 7 energy bins (2× 102 Ge. V-103 Ge. V+equally logarithmically spaced from 1 Te. V to 109 Ge. V) - Flat power-law spectrum with 0 spectral index - zenith angle from 0 to 89 degrees - 3. 000 hi-energy events ( 107 -109 Ge. V) Vs. 1. 100 of the previous productions - Estimated about 10 days of computation time on 1064 cores for each HE model Bernardino Spisso, INFN 14
ASTERICS & KM 3 Ne. T Ongoing production Energy range (Ge. V) Number of events 200 -1000 15 x 105 103 -104 15 x 105 104 -105 15 x 105 -106 15 x 105 106 -107 15 x 105 107 -108 15 x 105 108 -109 15 x 105 About 10 M events per HE model. High energy model Flat distributed events among the energy bins. Low energy model Option TAULEP QGSJET 01 GHEISHA X QGSJETII-04 GHEISHA X EPOS LHC GHEISHA X Bernardino Spisso, INFN CHARM X 11
ASTERICS & KM 3 Ne. T Data storage and sharing The estimated total data amount at the end of the ongoing production is around 10 TB. Where and how to store/share the data? Dedicated HW ? Cloud? EUDAT? (nevertheless the standard account is address for documents and small scale data (20 GB per record) Now the first productions (about 600 GB) are stored and available via SFTP in a local server hosted at the University of Salerno. Bernardino Spisso, INFN 16
ASTERICS & KM 3 Ne. T Temporary repository CORELib can be downloaded via SFTP at corelib@193. 205. 188. 227 pwd = Asterics 2020 The CORSIKA output binary files are supplied in a compressed tar file together with a text file containing all the standard output of the corresponding run. Bernardino Spisso, INFN 13
ASTERICS & KM 3 Ne. T The first two productions are contained the directory “Standard“. There are two variants for Cherenkov productions sharing the same features They differ only for the total number of files: • The “Cherenkov“ directory contains the production split in 1160 files • The “Cherenkov-201“ directory contains the production split in 201 files All the Cherenkov production files are contained in 4 subdirectories named after the HE-models. Bernardino Spisso, INFN 14
ASTERICS & KM 3 Ne. T The details for each file for the first two productions are reported in the Standard. xlsx file which is an Excel 2016 spreadsheet. Bernardino Spisso, INFN 15
ASTERICS & KM 3 Ne. T For the Cherenkov runs, besides Cherenkov. xlsx and Cherenkov-201. xlsx spreadsheet files, there are two sqlite files named Cherenkov. db and Cherenkov-201. db which can be queried using SQL (i. e. using the sqlite 3 program) These databases contain summary information about each run. Bernardino Spisso, INFN 16
ASTERICS & KM 3 Ne. T Depending on the user needs, the CORSIKA output can be used directly as a binary file or can be translated in ASCII human readable format. There are various possible “translators” (e. g. for KM 3 Ne. T/ANTARES, CORANT is used) The very general converter named “corsikaread” is supplied with CORSIKA in the src/utils/ directory. Bernardino Spisso, INFN 17
ASTERICS & KM 3 Ne. T Another way to handle output files is to use the C++ free COAST library (https: //web. ikp. kit. edu/rulrich/coast. html) which provides tools to convert the standard binary CORSIKA output in ROOT file. COAST allows to arrange the particles data in a ROOT TTree It provides some basic graphical tools such as ROOT plots or histograms. Bernardino Spisso, INFN 18
ASTERICS & KM 3 Ne. T Conclusions CORELib is flexible: several models used to provide simulations CORELib is “plug-and-play”: common data formats, immediate usage CORELib is open-access: SFTP with common user/pwd CORELib is extensible: we provide the full set of parameters, so if other Collaborations or institutions want to add datasets, they can do with/without overlap CORELib is in the spirit of ASTERICS: a tool needed by KM 3 Ne. T, whose features have been extended to prove useful to many people in the community (e. g. , Cherenkov radiation and high inclination would not be needed by KM 3 Ne. T) Bernardino Spisso, INFN 23
ASTERICS & KM 3 Ne. T Acknowledgements • H 2020 -Astronomy ESFRI and Research Infrastructure Cluster (Grant Agreement number: 653477). Thank you for the attention. Bernardino Spisso, INFN 24
- Slides: 24