The HDF Group HDF at NASA NOAA and

  • Slides: 50
Download presentation
The HDF Group HDF at NASA, NOAA and Met Office and people behind it

The HDF Group HDF at NASA, NOAA and Met Office and people behind it Elena Pourmal epourmal@hdfgroup. org The HDF Group 9/17/2020 1 www. hdfgroup. org

Outline • The HDF Group company • Mission and vision • Products and services

Outline • The HDF Group company • Mission and vision • Products and services • What do we do for NASA, NOAA and Met Office • Overview of HDF 5 • • Portability Extensibility Parallel HDF 5 Internal compression 9/17/2020 2 www. hdfgroup. org

THE HDF GROUP COMPANY 9/17/2020 3 www. hdfgroup. org

THE HDF GROUP COMPANY 9/17/2020 3 www. hdfgroup. org

Champaign, Illinois, USA 9/17/2020 4 www. hdfgroup. org

Champaign, Illinois, USA 9/17/2020 4 www. hdfgroup. org

The HDF Group www. hdfgroup. org • Not-for-profit company (since 2006), ex-NCSA at University

The HDF Group www. hdfgroup. org • Not-for-profit company (since 2006), ex-NCSA at University of Illinois • Offices in 5 states • About 40 employees (more than 50% growth in the past 9 years) - Core software developers Domain specialists Documentation team Technical support • Mission-driven 9/17/2020 5 www. hdfgroup. org

The HDF Group Mission To ensure long-term accessibility of HDF data through sustainable development

The HDF Group Mission To ensure long-term accessibility of HDF data through sustainable development and support of HDF technologies. 9/17/2020 6 www. hdfgroup. org

DATA CHALLENGES ADDRESSED BY HDF 9/17/2020 7 www. hdfgroup. org

DATA CHALLENGES ADDRESSED BY HDF 9/17/2020 7 www. hdfgroup. org

Data Organization and Preservation Need to organize complex collections of data lat | lon

Data Organization and Preservation Need to organize complex collections of data lat | lon | temp ----|----E Se xp| 12 | 923 ria erim 3. 1 93 l 7 N ent N te: 892|um 4. 2 15 S |Co. Dn a 24 0 ber otes 3 : tan fig /13 : ur /0 d a a 9 3. 6 17 | rd 21 ti | 3 on : Long term data preservation Efficient, scalable storage and access 9/17/2020 8 www. hdfgroup. org

Success stories • Petabytes of NASA remote sensing data in HDF 4 and HDF

Success stories • Petabytes of NASA remote sensing data in HDF 4 and HDF 5 file formats • New NASA/JPSS missions chose HDF 5 format for data archiving 9/17/2020 9 www. hdfgroup. org

Data Variety and Complexity LCI Tutorial 9/17/2020 Thanks to Mark Miller, LLNL 10 10

Data Variety and Complexity LCI Tutorial 9/17/2020 Thanks to Mark Miller, LLNL 10 10 www. hdfgroup. org

Data Access on Big Computers … and small computers … and it has to

Data Access on Big Computers … and small computers … and it has to be FAST 9/17/2020 11 11 www. hdfgroup. org

Success story: CGNS for HPC • CGNS – CFD standard • The HDF Group

Success story: CGNS for HPC • CGNS – CFD standard • The HDF Group helped to tune CGNS to solve large scale problems • Computational mesh size ~33 million elements BEFORE IMPROVEMENTS ~200 million nodes • Efficiently handles large I/O from Exascale CFD simulations Thanks to Scot Breitenfeld, THG 9/17/2020 12 www. hdfgroup. org

Success story: Trillion Particle Simulation • Physics plasma simulation at NERSC Cray XE 6

Success story: Trillion Particle Simulation • Physics plasma simulation at NERSC Cray XE 6 • Simulation ran on 120, 000 cores using 80% of computing resources 90% of available memory 50% of Lustre scratch system and writing 10 one-trillion particle dumps of 30 -42 TBs in HDF 5 files; sustained ~ 27 GB/sec; total 350 TBs in HDF 5 9/17/2020 13 www. hdfgroup. org

The HDF Group philosophy • Committed to Open Source • HDF software is free

The HDF Group philosophy • Committed to Open Source • HDF software is free • BSD type of license • Community involvement • Testing • Patches • New features (e. g. , CMake support) • Serving diverse user base • Remote sensing, HPC, non-destructive testing, medical records, scientific modeling, etc. 9/17/2020 14 www. hdfgroup. org

Brief History of HDF 1987 At NCSA (University of Illinois), a task force formed

Brief History of HDF 1987 At NCSA (University of Illinois), a task force formed to create an architecture-independent format and library: AEHOO (All Encompassing Hierarchical Object Oriented format) Became HDF Early NASA adopted HDF for Earth Observing System project 1990’s 1996 DOE’s ASC (Advanced Simulation and Computing) Project began collaborating with the HDF group (NCSA) to create “Big HDF” (Increase in computing power of DOE systems at LLNL, LANL and Sandia National labs, required bigger, more complex data files). “Big HDF” became HDF 5 1998 HDF 5 was released with support from National Labs, NASA, NCSA 2006 The HDF Group spun off from University of Illinois as non-profit corporation 9/17/2020 15 www. hdfgroup. org

Members of the HDF community 9/17/2020 16 www. hdfgroup. org

Members of the HDF community 9/17/2020 16 www. hdfgroup. org

Revenues by source Other Govt & Academic 25% Commercial 32% NASA & NOAA 43%

Revenues by source Other Govt & Academic 25% Commercial 32% NASA & NOAA 43% 9/17/2020 17 www. hdfgroup. org

PRODUCTS AND SERVICES 9/17/2020 18 www. hdfgroup. org

PRODUCTS AND SERVICES 9/17/2020 18 www. hdfgroup. org

The HDF Group products • Main product: HDF Technology Suite - For managing high

The HDF Group products • Main product: HDF Technology Suite - For managing high volume complex, heterogeneous data - Flagship: HDF 5 data store - Flexible and efficient storage and I/O Portable Highly customizable Misc. tools - Specialized software and tools (e. g. , JPSS) 9/17/2020 19 www. hdfgroup. org

HDF 5 Technology Platform • HDF 5 Abstract Data Model • Defines the “building

HDF 5 Technology Platform • HDF 5 Abstract Data Model • Defines the “building blocks” for data organization and specification • Files, Groups, Links, Datasets, Attributes, Datatypes, Dataspaces • HDF 5 Software • Tools • Language Interfaces (C, Fortran, C++, Java) • HDF 5 Library • HDF 5 Binary File Format • Bit-level organization of HDF 5 file • Defined by HDF 5 File Format Specification 9/17/2020 20 www. hdfgroup. org

HDF 5 API and Applications Climate Model Domain Data Objects EOS library HDF 5

HDF 5 API and Applications Climate Model Domain Data Objects EOS library HDF 5 Library Applications Language Interfaces Internals Virtual File Layer my. App net. CDF 4 library Your. Bio. App Tunable Properties Groups, Datasets, Attributes, … Dataspace Datatype Selection Conversion POSIX I/O Filters Split Files … Bio. HDF library Sony Pict Field 3 d HDF 5 API HDF 5 Data Model Objects MATLAB© Creation, Access, Transfer, . . Chunked Storage MPI I/O Version Compatibility … and so on… Custom Storage 9/17/2020 21 www. hdfgroup. org

HDF 5 Software • HDF 5 runs on all flavors of Linux, Mac OS

HDF 5 Software • HDF 5 runs on all flavors of Linux, Mac OS X, Windows, AIX, Solaris, Free. BDS, Cray, etc. • GNU, Intel, PGI compilers • Platforms and configurations tested for each release are listed at https: //www. hdfgroup. org/HDF 5/release/platforms 5. html • Talk to us if you need help to port HDF 5 to unsupported platform! 9/17/2020 22 www. hdfgroup. org

The HDF Group services • Helpdesk and mailing lists - help@hdfgroup. org - hdf-forum@hdfgroup.

The HDF Group services • Helpdesk and mailing lists - help@hdfgroup. org - hdf-forum@hdfgroup. org - Open to all users of HDF • HDF 5 Documentation https: //www. hdfgroup. org/HDF 5/doc/index. html • HDF Examples (C, Fortran, C++, Java, Python, MATLAB) https: //www. hdfgroup. org/HDF 5/examples/ 9/17/2020 23 www. hdfgroup. org

The HDF Group services • Standard support • Assistance in general areas of HDF

The HDF Group services • Standard support • Assistance in general areas of HDF usage • Premium support • Access to our consulting and training resources • Limited consulting hours are included • Enterprise support • Help with developing common strategies for managing HDF data within organization • Organization shares consulting/troubleshooting services • Training • Consulting, custom development and support 9/17/2020 24 www. hdfgroup. org

Sustaining mission critical software HDF @NASA 9/17/2020 25 www. hdfgroup. org

Sustaining mission critical software HDF @NASA 9/17/2020 25 www. hdfgroup. org

Support for NASA Goal: Sustain mission-critical software technology “The HDF Group conducts evolutionary development

Support for NASA Goal: Sustain mission-critical software technology “The HDF Group conducts evolutionary development program to improve the reliability, availability, functionality, operability, and performance of the HDF Subsystem within the EOSDIS while reducing operational and maintenance costs. ” 9/17/2020 26 www. hdfgroup. org

Support for NASA • Sustaining Engineering • Maintenance releases of HDF software and tools

Support for NASA • Sustaining Engineering • Maintenance releases of HDF software and tools - HDF 4 and HDF 5 libraries - Java tools - OPe. NDAP HDF servers • Technical support to HDF software developers - HDF-EOS 2(5) - net. CDF-4 • Vendors support (MATLAB, IDL) • Community support (h 5 py, Py. Tables) • User support • Special projects (e. g. , h 4 mapping) 9/17/2020 27 www. hdfgroup. org

Support for NASA • Other projects • BEDI • Provide technical leadership and oversight

Support for NASA • Other projects • BEDI • Provide technical leadership and oversight for the Metadata Guidance for Earth Observation Data • ISO Standard extension • Support the adoption of ISO TC 211 and other standards and conventions in EOSDIS data and metadata systems. • Support the entire data life-cycle including mission planning, data system implementation, data formatting/access (i. e. , HDF and net. CDF), granule, inventory, and collection metadata. 9/17/2020 28 www. hdfgroup. org

HDF 5 Risk-reduction Support HDF @NOAA JPSS 9/17/2020 29 www. hdfgroup. org

HDF 5 Risk-reduction Support HDF @NOAA JPSS 9/17/2020 29 www. hdfgroup. org

Support for NOAA JPSS • Goal: Provide HDF 5 risk-reduction support for the distribution

Support for NOAA JPSS • Goal: Provide HDF 5 risk-reduction support for the distribution of JPSS VIIRS, OMPS and other sensor and environmental data product • Maintain HDF 5 and JPSS specific software and features developed by The HDF Group • Provide user and technical support • Perform special maintenance tasks • Perform special research projects as requested 9/17/2020 30 www. hdfgroup. org

JPSS Products in HDFView 9/17/2020 31 www. hdfgroup. org

JPSS Products in HDFView 9/17/2020 31 www. hdfgroup. org

nagg – aggregation and packaging tool 9 input files – 4 granules each in

nagg – aggregation and packaging tool 9 input files – 4 granules each in GMODOSVM 07… files Visualization with IDV 9/17/2020 32 www. hdfgroup. org

nagg – aggregation and packaging tool 1 output file – 36 granules in GMODO-SVM

nagg – aggregation and packaging tool 1 output file – 36 granules in GMODO-SVM 07… file Visualization with IDV 9/17/2020 33 www. hdfgroup. org

Making HDF 5 work on Met’s HPC systems HDF @MET OFFICE 9/17/2020 34 www.

Making HDF 5 work on Met’s HPC systems HDF @MET OFFICE 9/17/2020 34 www. hdfgroup. org

Support for Met Office • Goal: Help with porting HDF 5 and net. CDF-4

Support for Met Office • Goal: Help with porting HDF 5 and net. CDF-4 to systems at the Met Office • OSs and Compilers not supported by THG • Cross-compiling • Testing troubleshooting • Performance tuning 9/17/2020 35 www. hdfgroup. org

Examples EXPLORING NEW DIRECTIONS 9/17/2020 36 www. hdfgroup. org

Examples EXPLORING NEW DIRECTIONS 9/17/2020 36 www. hdfgroup. org

HDF 5 ODBC Driver • Open Data. Base Connectivity (ODBC) • Industry standard middleware

HDF 5 ODBC Driver • Open Data. Base Connectivity (ODBC) • Industry standard middleware API for accessing database management sys. • All analytics apps. have an ODBC client • Hi. Five – ODBC driver for HDF 5 • Windows, [Linux, Mac. OS X] • Client & Client/Serve • Accessing HDF 5 files from Excel & R 9/17/2020 37 Thanks to Gerd Heber, THG www. hdfgroup. org

HDF 5 for the Web • Can I access HDF 5 files remotely? •

HDF 5 for the Web • Can I access HDF 5 files remotely? • API? My (mobile) client speaks HTTP! • What is a file system? Who uses files anymore? • Cloud computing w/ HDF 5 Thanks to John Readey, THG 9/17/2020 38 www. hdfgroup. org

HDF 5 as an interface to non-HDF 5 storage 9/17/2020 39 www. hdfgroup. org

HDF 5 as an interface to non-HDF 5 storage 9/17/2020 39 www. hdfgroup. org

HDF 5 as an interface to non-HDF 5 storage • Different File Formats plugins:

HDF 5 as an interface to non-HDF 5 storage • Different File Formats plugins: 9/17/2020 40 www. hdfgroup. org

Powerful features HDF 5 9/17/2020 41 www. hdfgroup. org

Powerful features HDF 5 9/17/2020 41 www. hdfgroup. org

HDF 5 Features - Portability • HDF 5 file is a container • Data

HDF 5 Features - Portability • HDF 5 file is a container • Data can be moved between different systems • Self-describing datatypes and arrays • HDF 5 file is a “smart” container • Data is organized within the file • Group and link objects; reference datatypes • Data is organized within a collection of files • External links • Virtual datasets lat | lon | temp ----|----12 | 23 | 3. 1 15 | 24 | 4. 2 17 | 21 | 3. 6 9/17/2020 42 www. hdfgroup. org

HDF 5 Features - Extensibility • Data can be added to the existing arrays

HDF 5 Features - Extensibility • Data can be added to the existing arrays • Extensible along each dimension • File structure can be modified by adding new groups and datasets (arrays) • No limit on file size and number of objects stored in the file KBs 9/17/2020 TBs 43 www. hdfgroup. org

Powerful I/O • Fast • Partial I/O • I/O on subsets of arrays including

Powerful I/O • Fast • Partial I/O • I/O on subsets of arrays including compressed arrays • Customized I/O (Virtual File Drivers); writing/reading to/from • Memory • Sockets • Split files, family of files • Parallel I/O 9/17/2020 44 www. hdfgroup. org

PHDF 5 Implementation Layers Application PHDF 5 built on top of standard MPI-IO API

PHDF 5 Implementation Layers Application PHDF 5 built on top of standard MPI-IO API Parallel computing system (Linux cluster) Compute node Compute node I/O library (HDF 5) Parallel I/O library (MPI-I/O) Parallel file system (GPFS) Switch network/I/O servers Disk architecture & layout of data on disk 9/17/2020 45 www. hdfgroup. org

Concurrent Access to Data New data elements … Writer Reader …which … are added

Concurrent Access to Data New data elements … Writer Reader …which … are added to a dataset in the file… 9/17/2020 HDF 5 File can be read by a reader… with no IPC necessary. 46 www. hdfgroup. org

Internal Compression • HDF 5 comes with several compression methods and data transformations •

Internal Compression • HDF 5 comes with several compression methods and data transformations • GZIP, SZIP, nbit, scale+offset, checksum • HDF 5 can use custom compression and other filters Application HDF 5 Library Filter(s) Data • Loaded at application run time • Works with the HDF 5 off-shelf tools (e. g. , MATLAB) VFD HDF 5 File DLL or so library 9/17/2020 47 www. hdfgroup. org

net. CDF-4 • Data Model • Variable (HDF 5 dataset) • Dimension Scale (HDF

net. CDF-4 • Data Model • Variable (HDF 5 dataset) • Dimension Scale (HDF 5 Dim. Scales) • Attributes (HDF 5 attribute) • Group (HDF 5 group) • Library • Implements the model in many formats (net. CDF 3. *, HDF 4, CDM, including HDF 5 Takes advantage of HDF 5 chunking storage, compression, data organization, and parallel access 9/17/2020 48 www. hdfgroup. org

9/17/2020 49 www. hdfgroup. org

9/17/2020 49 www. hdfgroup. org

The HDF Group Thank You! Questions? 9/17/2020 50 www. hdfgroup. org

The HDF Group Thank You! Questions? 9/17/2020 50 www. hdfgroup. org