The HDF Group Introduction to HDF 5 Elena

  • Slides: 44
Download presentation
The HDF Group Introduction to HDF 5 Elena Pourmal Scot Breitenfeld The HDF Group

The HDF Group Introduction to HDF 5 Elena Pourmal Scot Breitenfeld The HDF Group https: //hdfgroup. org 2/25/2021 1 www. hdfgroup. org

Tutorial Goals and Structure • Goals • Help you to start learning HDF 5

Tutorial Goals and Structure • Goals • Help you to start learning HDF 5 • Give you insight into parallel I/O • Tutorial structure (lecture + demos) I. Introduction II. HDF 5 Basics III. HDF 5 Perfromance IV. Overview of parallel HDF 5 • Questions are welcome during the presentation! • Survey https: //www. surveymonkey. com/r/HDFIntro 2/25/2021 2 www. hdfgroup. org

Resources www. hdfgroup. org https: //support. hdfgroup. org https: //portal. hdfgroup. org help@hdfgroup. org

Resources www. hdfgroup. org https: //support. hdfgroup. org https: //portal. hdfgroup. org help@hdfgroup. org 2/25/2021 3 www. hdfgroup. org

Tutorial Resources • AWS system IP 34. 210. 14. 253 • Source directory contains

Tutorial Resources • AWS system IP 34. 210. 14. 253 • Source directory contains Examples-Tutorial. tar –xvf Examples-Tutorial. tar • Data /mnt/users/DATA • Presentations, sources, files ftp: //gamma. hdfgroup. org/pub/outgoing/epourmal/2018 -02 -28/ 2/25/2021 4 www. hdfgroup. org

Why HDF 5? • Have you ever asked yourself: • Where is my data?

Why HDF 5? • Have you ever asked yourself: • Where is my data? • How will (do, did) I organize my data? • How will I share my data? • What did I write into this file a week, a month, a year ago while working on my thesis? • How will I deal with one-file-per-processor in the Petascale (Exascale, …scale) era? • Do I need to be an “HPC pro” to do my research? 2/25/2021 5 www. hdfgroup. org

Why HDF 5? • HDF 5 hides all complexity so you can concentrate on

Why HDF 5? • HDF 5 hides all complexity so you can concentrate on Science • Organize data within a file • Optimized I/O to single shared file • Domain-specific libraries and tools, big codes built on HDF 5 • • Data analysis and Visualization (Vis. It, Para. View) Weather and Climate (net. CDF-4) CFD (CGNS) Matlab, Mathematica, Python (h 5 py), R 2/25/2021 6 www. hdfgroup. org

Introduction to HDF 5 • Introduce you to HDF 5 • HDF 5 data

Introduction to HDF 5 • Introduce you to HDF 5 • HDF 5 data model • HDF 5 programming model • Tools and examples 2/25/2021 7 www. hdfgroup. org

WHAT IS HDF 5? 2/25/2021 8 www. hdfgroup. org

WHAT IS HDF 5? 2/25/2021 8 www. hdfgroup. org

What is HDF 5? • HDF 5 == Hierarchical Data Format version 5 •

What is HDF 5? • HDF 5 == Hierarchical Data Format version 5 • Open file format • Designed for high volume or complex data • Open source software • Works with data in the format • A data model • Structures for data organization and specification 2/25/2021 9 www. hdfgroup. org

Rationale behind HDF 5 is designed for • Big Data - high volume and/or

Rationale behind HDF 5 is designed for • Big Data - high volume and/or complex heterogeneous data • Every size and type of system (portable, shareable) • Flexible, efficient storage and I/O • Enabling applications to evolve in their use of HDF 5 and to accommodate new models • Supporting long-term data preservation 2/25/2021 10 www. hdfgroup. org

HDF 5 DATA MODEL 2/25/2021 11 www. hdfgroup. org

HDF 5 DATA MODEL 2/25/2021 11 www. hdfgroup. org

HDF 5 Data Model Dataset Group Attribute Link HDF 5 Objects Datatype Dataspace File

HDF 5 Data Model Dataset Group Attribute Link HDF 5 Objects Datatype Dataspace File a. k. a. HDF 5 Abstract Data Model a. k. a. HDF 5 Logical Data Model 2/25/2021 12 www. hdfgroup. org

HDF 5 File An HDF 5 file is a container that holds data objects.

HDF 5 File An HDF 5 file is a container that holds data objects. 2/25/2021 lat | lon | temp ----|----12 | 23 Se|r. Exper 3. 1 D ial im Nu ent Co ate| 15 | 24 m N nf : 3/1 4. 2 ig 3 be ote ur /0 r: s: a 9 99 17 | 21 |tion 3. 6 37 : S tan 89 da 13 rd 20 3 www. hdfgroup. org

HDF 5 Dataset HDF 5 Datatype Integer 32 bit LE HDF 5 Dataspace Rank

HDF 5 Dataset HDF 5 Datatype Integer 32 bit LE HDF 5 Dataspace Rank 3 Dimensions Dim_0 = 4 Dim_1 = 5 Dim_2 = 7 Specifications for single data element and array dimensions Multi-dimensional array of identically typed data elements • HDF 5 datasets organize and contain “raw data values. ” • HDF 5 datatype describes individual data elements. • HDF 5 dataspace describes the logical layout of the data elements. 2/25/2021 14 www. hdfgroup. org

HDF 5 Dataspace • Describes the logical layout of the elements in an HDF

HDF 5 Dataspace • Describes the logical layout of the elements in an HDF 5 dataset • NULL • no elements • Scalar • single element • Simple array (most common) • multiple elements organized in a rectangular array • rank = number of dimensions • dimension sizes = number of elements in each dimension • maximum number of elements in each dimension • may be fixed or unlimited 2/25/2021 Extreme Scale Computing Argonne 15 www. hdfgroup. org

HDF 5 Dataspace Two roles: Dataspace contains spatial information • Rank and dimensions •

HDF 5 Dataspace Two roles: Dataspace contains spatial information • Rank and dimensions • Permanent part of dataset definition Rank = 2 Dimensions = 4 x 6 Partial I/0: Dataspace describes application’s data buffer and data elements participating in I/O Rank = 1 Dimension = 10 2/25/2021 16 www. hdfgroup. org

HDF 5 Datatypes • Describe individual data elements in an HDF 5 dataset •

HDF 5 Datatypes • Describe individual data elements in an HDF 5 dataset • Wide range of datatypes supported • Integer • Float • Enum • • • 2/25/2021 Array User-defined (e. g. , 13 -bit integer) Variable length types (e. g. , strings) Compound (similar to C structs) Many more … Extreme Scale Computing Argonne 17 www. hdfgroup. org

HDF 5 Dataset 3 5 12 Datatype: 32 -bit Integer Dataspace: Rank = 2

HDF 5 Dataset 3 5 12 Datatype: 32 -bit Integer Dataspace: Rank = 2 Dimensions = 5 x 3 2/25/2021 18 www. hdfgroup. org

HDF 5 Dataset with Compound Datatype 3 5 V int 16 char int 32

HDF 5 Dataset with Compound Datatype 3 5 V int 16 char int 32 V V V V 2 x 3 x 2 array of float 32 Compound Datatype: Dataspace: 2/25/2021 Rank = 2 Dimensions = 5 x 3 19 www. hdfgroup. org

How data is stored? Buffer in memory Data in the file Contiguous (default) Data

How data is stored? Buffer in memory Data in the file Contiguous (default) Data elements stored physically adjacent to each other Chunked Better access time for subsets; extendible Chunked & Compressed Improves storage efficiency, transmission speed 2/25/2021 20 www. hdfgroup. org

HDF 5 Attributes • Typically contain user metadata • Have a name and a

HDF 5 Attributes • Typically contain user metadata • Have a name and a value • Attributes “decorate” HDF 5 objects • Value is described by a datatype and a dataspace Analogous to a dataset, but do not support partial I/O operations; nor can they be compressed or extended 2/25/2021 21 www. hdfgroup. org

HDF 5 File An HDF 5 file is a smart container that holds data

HDF 5 File An HDF 5 file is a smart container that holds data objects. 2/25/2021 lat | lon | temp ----|----12 | 23 Se|r. Exper 3. 1 D ial im Nu ent Co ate| 15 | 24 m N nf : 3/1 4. 2 ig 3 be ote ur /0 r: s: a 9 99 17 | 21 |tion 3. 6 37 : S tan 89 da 22 rd 20 3 www. hdfgroup. org

HDF 5 Groups and Links HDF 5 groups and links organize data objects. Experiment

HDF 5 Groups and Links HDF 5 groups and links organize data objects. Experiment Notes: Serial Number: 99378920 Date: 3/13/09 Configuration: Standard 3 / Every HDF 5 file has a root group Sim. Out Viz Parameters 10; 1000 lat | lon | temp ----|----12 | 23 | 3. 1 15 | 24 | 4. 2 17 | 21 | 3. 6 Timestep 36, 000 2/25/2021 23 www. hdfgroup. org

HDFView: Example of HDF 5 file 2/25/2021 24 www. hdfgroup. org

HDFView: Example of HDF 5 file 2/25/2021 24 www. hdfgroup. org

HDF 5 SOFTWARE 2/25/2021 25 www. hdfgroup. org

HDF 5 SOFTWARE 2/25/2021 25 www. hdfgroup. org

HDF 5 Home Page HDF 5 home page: http: //hdfgroup. org/HDF 5/ • Latest

HDF 5 Home Page HDF 5 home page: http: //hdfgroup. org/HDF 5/ • Latest release: HDF 5 1. 8. 20, HDF 5 1. 10. 1 HDF 5 source code: • • Written in C, and includes optional C++, Fortran, Java APIs, and High-Level APIs Contains command-line utilities (h 5 dump, h 5 repack, h 5 diff, . . ) and compile scripts HDF 5 pre-built binaries: • When possible, include C, C++, F 90, and High-Level libraries. Check. /libhdf 5. settings file. • Built with and require the SZIP and ZLIB external libraries 2/25/2021 26 www. hdfgroup. org

HDF 5 Software Layers and Storage HDF 5 Library Apps API CGNS Language Interfaces

HDF 5 Software Layers and Storage HDF 5 Library Apps API CGNS Language Interfaces C, Fortran, C++ Internals Virtual File Layer Storage I/O Drivers HDF 5 File Format 2/25/2021 High Level APIs net. CDF-4 Tools H 5 py R HDFview Java Interface HDF 5 Data Model Objects Groups, Datasets, Attributes, … Memory Mgmt Datatype Conversion Filters Split Files Posix I/O File Tunable Properties Chunk Size, I/O Driver, … Chunked Storage Version and so on… Compatibility Custom MPI I/O Split Files File on Parallel Filesystem 27 ? Other www. hdfgroup. org

Useful Tools For New Users h 5 dump: Tool to “dump” or display contents

Useful Tools For New Users h 5 dump: Tool to “dump” or display contents of HDF 5 files h 5 cc, h 5 c++, h 5 fc: Scripts to compile applications HDFView: Java browser to view HDF 5 files http: //www. hdfgroup. org HDF 5 Examples (C, Fortran, Java, Python, Matlab) http: //support. hdfgroup. org 2/25/2021 28 www. hdfgroup. org

HDF 5 PROGRAMMING MODEL AND API 2/25/2021 29 www. hdfgroup. org

HDF 5 PROGRAMMING MODEL AND API 2/25/2021 29 www. hdfgroup. org

General Programming Paradigm • Object is opened or created • Object is accessed, possibly

General Programming Paradigm • Object is opened or created • Object is accessed, possibly many times • Object is closed • Properties of object are optionally defined üCreation properties (e. g. , use chunking storage) üAccess properties 2/25/2021 30 www. hdfgroup. org

The General HDF 5 API • C, Fortran, Java, C++, and. NET bindings •

The General HDF 5 API • C, Fortran, Java, C++, and. NET bindings • IDL, MATLAB, Python (H 5 Py, Py. Tables) • C routines begin with prefix H 5? ? is a character corresponding to the type of object the function acts on Example Functions: H 5 D : Dataset interface e. g. , H 5 Dread H 5 F : File interface e. g. , H 5 Fopen H 5 S : data. Space interface e. g. , H 5 Sclose 2/25/2021 31 www. hdfgroup. org

The HDF 5 API • For flexibility, the API is extensive Victorinox Swiss Army

The HDF 5 API • For flexibility, the API is extensive Victorinox Swiss Army Cybertool 34 ü 300+ functions • This can be daunting… but there is hope ü A few functions can do a lot ü Start simple ü Build up knowledge as more features are needed 2/25/2021 32 www. hdfgroup. org

Basic Functions H 5 Fcreate (H 5 Fopen) create (open) File H 5 Screate_simple/H

Basic Functions H 5 Fcreate (H 5 Fopen) create (open) File H 5 Screate_simple/H 5 Screate data. Space H 5 Dcreate (H 5 Dopen) H 5 Dread, H 5 Dwrite H 5 Dclose H 5 Sclose H 5 Fclose 2/25/2021 create (open) Dataset access Dataset close data. Space close File 33 www. hdfgroup. org

Other Common Functions Data. Spaces: H 5 Sselect_hyperslab (Partial I/O) H 5 Sselect_elements (Partial

Other Common Functions Data. Spaces: H 5 Sselect_hyperslab (Partial I/O) H 5 Sselect_elements (Partial I/O) H 5 Dget_space Groups: H 5 Gcreate, H 5 Gopen, H 5 Gclose Attributes: H 5 Acreate, H 5 Aopen_name, H 5 Aclose, H 5 Aread, H 5 Awrite Property lists: H 5 Pcreate, H 5 Pclose H 5 Pset_chunk, H 5 Pset_deflate 2/25/2021 34 www. hdfgroup. org

C EXAMPLES 2/25/2021 35 www. hdfgroup. org

C EXAMPLES 2/25/2021 35 www. hdfgroup. org

How to compile HDF 5 applications Autotools build Wrappers contain compiler/linker flags • h

How to compile HDF 5 applications Autotools build Wrappers contain compiler/linker flags • h 5 cc – HDF 5 C compiler command • h 5 fc – HDF 5 F 90 compiler command • H 5 c++ – HDF 5 C++ compiler command • To compile: h 5 cc h 5 prog. c h 5 fc h 5 prog. f 90 h 5 c++ h 5 prog. cpp 2/25/2021 36 www. hdfgroup. org

Code: Create a File hid_t herr_t file_id; status; file_id = H 5 Fcreate("file. h

Code: Create a File hid_t herr_t file_id; status; file_id = H 5 Fcreate("file. h 5", H 5 F_ACC_TRUNC, H 5 P_DEFAULT); status = H 5 Fclose (file_id); “/” (root) Note: Return codes not checked for errors in code samples. 2/25/2021 37 www. hdfgroup. org

Example: Create a Group “/” (root) A B 4 x 6 array of integers

Example: Create a Group “/” (root) A B 4 x 6 array of integers file. h 5 2/25/2021 38 www. hdfgroup. org

Code: Create a Dataset 1 2 3 hid_t hsize_t herr_t 4 file_id = H

Code: Create a Dataset 1 2 3 hid_t hsize_t herr_t 4 file_id = H 5 Fcreate (”file. h 5", H 5 F_ACC_TRUNC, H 5 P_DEFAULT); dims[0] = 4; dims[1] = 6; dataspace_id = H 5 Screate_simple (2, dims, NULL); dataset_id = H 5 Dcreate (file_id, ”A", H 5 T_STD_I 32 BE, dataspace_id, H 5 P_DEFAULT, H 5 P_DEFAULT); 5 6 7 8 file_id, dataset_id, dataspace_id; dims[2]; status; 9 status = H 5 Dclose (dataset_id); 10 status = H 5 Sclose (dataspace_id); 11 status = H 5 Fclose (file_id); 2/25/2021 “/” (root) A 39 www. hdfgroup. org

Code: Create a Group hid_t file_id, group_id; . . . /* Open “file. h

Code: Create a Group hid_t file_id, group_id; . . . /* Open “file. h 5” */ file_id = H 5 Fopen (“file. h 5”, H 5 F_ACC_RDWR, H 5 P_DEFAULT); /* Create group "/B" in file. */ group_id = H 5 Gcreate (file_id, "B", H 5 P_DEFAULT, H 5 P_DEFAULT); /* Close group and file. */ status = H 5 Gclose (group_id); status = H 5 Fclose (file_id); 2/25/2021 40 www. hdfgroup. org

Output of h 5 dump file. h 5 HDF 5 "file. h 5" {

Output of h 5 dump file. h 5 HDF 5 "file. h 5" { GROUP "/" { DATASET "A" { DATATYPE H 5 T_STD_I 32 BE DATASPACE SIMPLE { ( 4, 6 ) / ( 4, 6 ) } DATA { (0, 0): 0, 0, 0, (1, 0): 0, 0, 0, (2, 0): 0, 0, 0, (3, 0): 0, 0, 0, 0 } } GROUP "B" { } } } 2/25/2021 41 www. hdfgroup. org

Example Code - H 5 Dwrite int wdata[4][6]; /* Initialize the dataset. */ for

Example Code - H 5 Dwrite int wdata[4][6]; /* Initialize the dataset. */ for (i = 0; i < 4; i++) for (j = 0; j < 6; j++) wdata[i][j] = i * 6 + j + 1; …. . status = H 5 Dwrite (dataset_id, H 5 T_NATIVE_INT, H 5 S_ALL, H 5 P_DEFAULT, wdata); 2/25/2021 42 www. hdfgroup. org

Output of h 5 dump after writing $ h 5 dump file. h 5

Output of h 5 dump after writing $ h 5 dump file. h 5 HDF 5 "file. h 5" { GROUP "/" { DATASET "A" { DATATYPE H 5 T_STD_I 32 BE DATASPACE SIMPLE { ( 4, 6 ) / ( 4, 6 ) } DATA { (0, 0): 1, 2, 3, 4, 5, 6, (1, 0): 7, 8, 9, 10, 11, 12, (2, 0): 13, 14, 15, 16, 17, 18, (3, 0): 19, 20, 21, 22, 23, 24 } } GROUP "B" { } } } 2/25/2021 43 www. hdfgroup. org

The HDF Group Thank You! Questions? 2/25/2021 44 www. hdfgroup. org

The HDF Group Thank You! Questions? 2/25/2021 44 www. hdfgroup. org