Astronomy ESFRI Research Infrastructure Cluster ASTERICS 653477 nd

  • Slides: 8
Download presentation
Astronomy ESFRI & Research Infrastructure Cluster ASTERICS - 653477 nd 3 ASTERICS-OBELICS Workshop 23

Astronomy ESFRI & Research Infrastructure Cluster ASTERICS - 653477 nd 3 ASTERICS-OBELICS Workshop 23 -25 October 2018, Cambridge, UK. H 2020 -Astronomy ESFRI and Research Infrastructure Cluster (Grant Agreement number: 653477). 19/10/2018 ASTERICS-OBELICS Workshop 2018 / Cambridge 1

Casacore data storage Tammo Jan Dijkema October 2018

Casacore data storage Tammo Jan Dijkema October 2018

Casacore • Casacore is joint radio astronomy project for common software applications • Currently

Casacore • Casacore is joint radio astronomy project for common software applications • Currently actively maintained by CASA group and ASTRON • Hosted on github. • Large part is Tables system (~100 K lines of code, ~20% of casacore)

Casacore Tables Data System (Please have a look at Van Diepen 2015) • Measurement

Casacore Tables Data System (Please have a look at Van Diepen 2015) • Measurement Sets are always stored in Casacore Table Data System (CTDS). • Columns are stored in configurable Storage. Manager. • Different storage managers exist to facilitate different use cases (write speed, read speed for different access patterns, storage size, parallel access, …)

Problems with CTDS • Many small files: every subtable stores its own set of

Problems with CTDS • Many small files: every subtable stores its own set of files. Does not go well with Lustre. • Number of rows limited to 32 bit (~4 e 9). • Locking mechanism does not go well with parallel access. • No MPI support (yet). • Read and write speed? ? ? Benchmarks needed

Should we replace CTDS? • Benchmarks HDF 5 / CTDS Tiled. St. Man: •

Should we replace CTDS? • Benchmarks HDF 5 / CTDS Tiled. St. Man: • • Access patterns inventorized for LOFAR, VLA, (ng. VLA) Ran on different architectures Various tile shapes used in both HDF 5 and CTDS Preliminary conclusion: • • Performance is similar Except worst access patttern (many small chunks), here CTDS is much faster • Proof-of-concept MPI-IO based storage manager being written as part of MSv 3 • For performance, no obvious reason to replace CTDS by other technology

ADIOS 2 • ADIOS (The ADaptable Input/Output System) developed by Oak Ridge National Laboratory

ADIOS 2 • ADIOS (The ADaptable Input/Output System) developed by Oak Ridge National Laboratory as part of U. S. Department of Energy Exascale Computing Program • • Collaboration with HDF Group, ADIOS 2 supports HDF 5 format Supports: • • • Streaming I/O (RDMA, …) Parallel I/O (MPI based) … • Intended for exascale scientific computing • Jason Wang (Oak Ridge) is developing ADIOS support in CTDS with MPI support • First prototype published as http: //dx. doi. org/10. 1016/j. ascom. 2016. 05. 003

Astronomy ESFRI & Research Infrastructure Cluster ASTERICS - 653477 Acknowledgement • H 2020 -Astronomy

Astronomy ESFRI & Research Infrastructure Cluster ASTERICS - 653477 Acknowledgement • H 2020 -Astronomy ESFRI and Research Infrastructure Cluster (Grant Agreement number: 653477). 19/10/2018 ASTERICS-OBELICS Workshop 2018 / Cambridge 8