The HDF Group Introduction to HDF 5 Quincey

  • Slides: 53
Download presentation
The HDF Group Introduction to HDF 5 Quincey Koziol Principal Data Architect NERSC –

The HDF Group Introduction to HDF 5 Quincey Koziol Principal Data Architect NERSC – LBNL koziol@lbl. gov April 26, 2017 HDF 5 Overview @ UC Berkeley 1 www. hdfgroup. org/HDF 5

Why HDF 5? • Have you ever asked yourself: • How will I deal

Why HDF 5? • Have you ever asked yourself: • How will I deal with one-file-per-processor in the petascale era? • Do I need to be an “MPI and Lustre pro” to do my research? • Where is my checkpoint file? • HDF 5 hides all complexity so you can concentrate on Science • Optimized I/O to single shared file HDF 5 Overview @ UC Berkeley 2 www. hdfgroup. org/HDF 5

Goal • Introduce you to HDF 5 • HDF 5 Overview • HDF 5

Goal • Introduce you to HDF 5 • HDF 5 Overview • HDF 5 Data model • HDF 5 Programming model HDF 5 Overview @ UC Berkeley 3 www. hdfgroup. org/HDF 5

WHAT IS HDF 5? HDF 5 Overview @ UC Berkeley 4 www. hdfgroup. org/HDF

WHAT IS HDF 5? HDF 5 Overview @ UC Berkeley 4 www. hdfgroup. org/HDF 5

What is HDF 5? • HDF 5 == Hierarchical Data Format, v 5 •

What is HDF 5? • HDF 5 == Hierarchical Data Format, v 5 • Open file format • Designed for high volume or complex data • Open source software • Works with data in the format • An extensible data model • Structures for data organization and specification HDF 5 Overview @ UC Berkeley 5 www. hdfgroup. org/HDF 5

HDF 5 is like … HDF 5 Overview @ UC Berkeley 6 www. hdfgroup.

HDF 5 is like … HDF 5 Overview @ UC Berkeley 6 www. hdfgroup. org/HDF 5

HDF 5 is designed … • for high volume and/or complex data • for

HDF 5 is designed … • for high volume and/or complex data • for every size and type of system (portable) • for flexible, efficient storage and I/O • to enable applications to evolve in their use of HDF 5 and to accommodate new models • to support long-term data preservation April 26, 2017 HDF 5 Overview @ UC Berkeley 7 www. hdfgroup. org/HDF 5

HDF 5 DATA MODEL HDF 5 Overview @ UC Berkeley 8 www. hdfgroup. org/HDF

HDF 5 DATA MODEL HDF 5 Overview @ UC Berkeley 8 www. hdfgroup. org/HDF 5

HDF 5 File An HDF 5 file is a container that holds data objects.

HDF 5 File An HDF 5 file is a container that holds data objects. April 26, 2017 HDF 5 Overview @ UC Berkeley lat | lon | temp ----|----12 | 23 Se|r. Exper 3. 1 D ial im Nu ent Co ate| 15 | 24 m N nf : 3/1 4. 2 ig 3 be ote ur /0 r: s: a 9 99 17 | 21 |tion 3. 6 37 : S tan 89 da 9 rd 20 3 www. hdfgroup. org/HDF 5

HDF 5 Data Model Dataset Group Attribute Link HDF 5 Objects Datatype Dataspace File

HDF 5 Data Model Dataset Group Attribute Link HDF 5 Objects Datatype Dataspace File April 26, 2017 HDF 5 Overview @ UC Berkeley 10 www. hdfgroup. org/HDF 5

HDF 5 Dataset HDF 5 Datatype Integer: 32 -bit, LE HDF 5 Dataspace Rank

HDF 5 Dataset HDF 5 Datatype Integer: 32 -bit, LE HDF 5 Dataspace Rank 3 Dimensions Dim[0] = 4 Dim[1] = 5 Dim[2] = 7 Specifications for single data element and array dimensions Multi-dimensional array of identically typed data elements • HDF 5 datasets organize and contain data elements. • HDF 5 datatype describes individual data elements. • HDF 5 dataspace describes the logical layout of the data elements. HDF 5 Overview @ UC Berkeley 11 www. hdfgroup. org/HDF 5

HDF 5 Dataspace • Describes the logical layout of the elements in an HDF

HDF 5 Dataspace • Describes the logical layout of the elements in an HDF 5 dataset • NULL • no elements • Scalar • single element • Simple array (most common) • multiple elements organized in a rectangular array • rank = number of dimensions • dimension sizes = number of elements in each dimension • maximum number of elements in each dimension • may be fixed or unlimited April 26, 2017 Extreme Scale Computing Argonne 12 www. hdfgroup. org/HDF 5

HDF 5 Dataspace Two roles: Dataspace contains spatial information • Rank and dimensions •

HDF 5 Dataspace Two roles: Dataspace contains spatial information • Rank and dimensions • Permanent part of dataset definition Rank = 2 Dimensions = 4 x 6 Partial I/0: Dataspace describes application’s data buffer and data elements participating in I/O Rank = 1 Dimension = 10 April 26, 2017 HDF 5 Overview @ UC Berkeley 13 www. hdfgroup. org/HDF 5

HDF 5 Datatypes • Describe individual data elements in an HDF 5 dataset •

HDF 5 Datatypes • Describe individual data elements in an HDF 5 dataset • Wide range of datatypes supported • Integer • Float • Enum • • • Array User-defined (e. g. , 13 -bit integer) Variable-length types (e. g. , strings, vectors) Compound (similar to C structs) More … Extreme Scale Computing HDF 5 14 www. hdfgroup. org/HDF 5

HDF 5 Dataset 3 5 12 Datatype: 32 -bit Integer Dataspace: Rank = 2

HDF 5 Dataset 3 5 12 Datatype: 32 -bit Integer Dataspace: Rank = 2 Dimensions = 5 x 3 HDF 5 Overview @ UC Berkeley 15 www. hdfgroup. org/HDF 5

HDF 5 Dataset with Compound Datatype 3 5 V uint 16 char int 32

HDF 5 Dataset with Compound Datatype 3 5 V uint 16 char int 32 V V V V 2 x 3 x 2 array of float 32 Compound Datatype: Dataspace: Rank = 2 Dimensions = 5 x 3 HDF 5 Overview @ UC Berkeley 16 www. hdfgroup. org/HDF 5

How are data elements stored? Buffer in memory Data in the file Contiguous (default)

How are data elements stored? Buffer in memory Data in the file Contiguous (default) Data elements stored physically adjacent to each other Chunked Better access time for subsets; extendible Improves storage efficiency, transmission speed Chunked & Compressed HDF 5 Overview @ UC Berkeley 17 www. hdfgroup. org/HDF 5

HDF 5 Attributes • Typically contain user metadata • Have a name and a

HDF 5 Attributes • Typically contain user metadata • Have a name and a value • Attributes “decorate” HDF 5 objects • Value is described by a datatype and a dataspace • Analogous to a dataset, but do not support partial I/O operations; nor can they be compressed or extended 18 www. hdfgroup. org/HDF 5

HDF 5 Groups and Links HDF 5 groups and links organize data objects. Experiment

HDF 5 Groups and Links HDF 5 groups and links organize data objects. Experiment Notes: Serial Number: 99378920 Date: 3/13/09 Configuration: Standard 3 / Every HDF 5 file has a root group Sim. Out Viz Parameters 10; 1000 lat | lon | temp ----|----12 | 23 | 3. 1 15 | 24 | 4. 2 17 | 21 | 3. 6 Timestep 36, 000 HDF 5 Overview @ UC Berkeley 20 www. hdfgroup. org/HDF 5

HDF 5 SOFTWARE HDF 5 Overview @ UC Berkeley 21 www. hdfgroup. org/HDF 5

HDF 5 SOFTWARE HDF 5 Overview @ UC Berkeley 21 www. hdfgroup. org/HDF 5

HDF 5 Home Page HDF 5 home page: http: //hdfgroup. org/HDF 5/ • Latest

HDF 5 Home Page HDF 5 home page: http: //hdfgroup. org/HDF 5/ • Latest release: HDF 5 1. 10. 0 (1. 10. 1 coming May 2017) HDF 5 source code: • • Written in C, and includes optional C++, Fortran APIs, and High Level APIs Contains command-line utilities (h 5 dump, h 5 repack, h 5 diff, . . ) and compile scripts HDF 5 pre-built binaries: • When possible, include C, C++, Fortran, and High Level libraries. Check. /libhdf 5. settings file. • Built with and require the SZIP and ZLIB external libraries April 26, 2017 HDF 5 Overview @ UC Berkeley 22 www. hdfgroup. org/HDF 5

Useful Tools For New Users h 5 dump: Tool to “dump” or display contents

Useful Tools For New Users h 5 dump: Tool to “dump” or display contents of HDF 5 files h 5 cc, h 5 c++, h 5 fc: Scripts to compile applications HDFView: Java browser to view HDF 5 files http: //www. hdfgroup. org/hdf-java-html/hdfview/ HDF 5 Examples (C, Fortran, Java, Python, Matlab, …) https: //www. hdfgroup. org/HDF 5/examples/ April 26, 2017 HDF 5 Overview @ UC Berkeley 23 www. hdfgroup. org/HDF 5

HDF 5 PROGRAMMING MODEL AND API HDF 5 Overview @ UC Berkeley 24 www.

HDF 5 PROGRAMMING MODEL AND API HDF 5 Overview @ UC Berkeley 24 www. hdfgroup. org/HDF 5

HDF 5 Library Apps HDF 5 Software Layers & Storage API H 5 Part

HDF 5 Library Apps HDF 5 Software Layers & Storage API H 5 Part Language Interfaces C, Fortran, C++ Internals Virtual File Layer Storage I/O Drivers HDF 5 File Format High Level APIs net. CDF-4 h 5 dump Java Interface HDF 5 Data Model Objects Groups, Datasets, Attributes, … Memory Mgmt Datatype Conversion Filters Split Files Posix I/O File HDFview Tunable Properties Chunk Size, I/O Driver, … Chunked Storage Version and so on… Compatibility Custom MPI I/O Split Files HDF 5 Overview @ UC Berkeley File on Parallel Filesystem 25 ? Other www. hdfgroup. org/HDF 5

The General HDF 5 API • C, Fortran, Java, C++, and. NET bindings •

The General HDF 5 API • C, Fortran, Java, C++, and. NET bindings • IDL, MATLAB, Python (H 5 Py, Py. Tables) • C routines begin with prefix: H 5? ? is a character corresponding to the type of object the function acts on Example Functions: H 5 D : Dataset interface e. g. , H 5 Dread H 5 F : File interface e. g. , H 5 Fopen H 5 S : data. Space interface e. g. , H 5 Sclose HDF 5 Overview @ UC Berkeley 26 www. hdfgroup. org/HDF 5

The HDF 5 API • For flexibility, the API is extensive Victorinox Swiss Army

The HDF 5 API • For flexibility, the API is extensive Victorinox Swiss Army Cybertool 34 ü 300+ functions • This can be daunting… but there is hope ü A few functions can do a lot ü Start simple ü Build up knowledge as more features are needed HDF 5 Overview @ UC Berkeley 27 www. hdfgroup. org/HDF 5

General Programming Paradigm • Object is opened or created • Object is accessed, possibly

General Programming Paradigm • Object is opened or created • Object is accessed, possibly many times • Object is closed • Properties of object are optionally defined üCreation properties (e. g. , use chunking storage) üAccess properties HDF 5 Overview @ UC Berkeley 28 www. hdfgroup. org/HDF 5

Basic Functions H 5 Fcreate (H 5 Fopen) create (open) File H 5 Screate_simple/H

Basic Functions H 5 Fcreate (H 5 Fopen) create (open) File H 5 Screate_simple/H 5 Screate data. Space H 5 Dcreate (H 5 Dopen) H 5 Dread, H 5 Dwrite H 5 Dclose H 5 Sclose H 5 Fclose create (open) Dataset access Dataset close data. Space close File HDF 5 Overview @ UC Berkeley 29 www. hdfgroup. org/HDF 5

Other Common Functions Data. Spaces: H 5 Sselect_hyperslab (Partial I/O) H 5 Sselect_elements (Partial

Other Common Functions Data. Spaces: H 5 Sselect_hyperslab (Partial I/O) H 5 Sselect_elements (Partial I/O) H 5 Dget_space Data. Types: H 5 Tcreate, H 5 Tcommit, H 5 Tclose H 5 Tequal, H 5 Tget_native_type Groups: H 5 Gcreate, H 5 Gopen, H 5 Gclose Attributes: H 5 Acreate, H 5 Aopen_name, H 5 Aclose, H 5 Aread, H 5 Awrite Property lists: H 5 Pcreate, H 5 Pclose H 5 Pset_chunk, H 5 Pset_deflate HDF 5 Overview @ UC Berkeley 30 www. hdfgroup. org/HDF 5

C EXAMPLES HDF 5 Overview @ UC Berkeley 31 www. hdfgroup. org/HDF 5

C EXAMPLES HDF 5 Overview @ UC Berkeley 31 www. hdfgroup. org/HDF 5

How to compile HDF 5 applications • h 5 cc – HDF 5 C

How to compile HDF 5 applications • h 5 cc – HDF 5 C compiler command • h 5 fc – HDF 5 F 90 compiler command • h 5 c++ - HDF 5 C++ compiler command • To compile: • % h 5 cc h 5 prog. c • % h 5 fc h 5 prog. f 90 • % h 5 c++ h 5 prog. cpp HDF 5 Overview @ UC Berkeley 32 www. hdfgroup. org/HDF 5

Code: Create a File hid_t herr_t file_id; status; file_id = H 5 Fcreate("file. h

Code: Create a File hid_t herr_t file_id; status; file_id = H 5 Fcreate("file. h 5", H 5 F_ACC_TRUNC, H 5 P_DEFAULT); status = H 5 Fclose (file_id); “/” (root) Note: Return codes not checked for errors in code samples. HDF 5 Overview @ UC Berkeley 33 www. hdfgroup. org/HDF 5

Code: Create a Dataset 1 2 3 hid_t hsize_t herr_t 4 file_id = H

Code: Create a Dataset 1 2 3 hid_t hsize_t herr_t 4 file_id = H 5 Fcreate ("file. h 5", H 5 F_ACC_TRUNC, H 5 P_DEFAULT); dims[0] = 4; dims[1] = 6; dataspace_id = H 5 Screate_simple (2, dims, NULL); dataset_id = H 5 Dcreate (file_id, "A", H 5 T_STD_I 32 BE, dataspace_id, H 5 P_DEFAULT, H 5 P_DEFAULT); 5 6 7 8 file_id, dataset_id, dataspace_id; dims[2]; status; 9 status = H 5 Dclose (dataset_id); 10 status = H 5 Sclose (dataspace_id); 11 status = H 5 Fclose (file_id); April 26, 2017 HDF 5 Overview @ UC Berkeley “/” (root) A 34 www. hdfgroup. org/HDF 5

Code: Create a Group hid_t file_id, group_id; . . . /* Open "file. h

Code: Create a Group hid_t file_id, group_id; . . . /* Open "file. h 5" */ file_id = H 5 Fopen ("file. h 5", H 5 F_ACC_RDWR, H 5 P_DEFAULT); /* Create group "/B" in file. */ group_id = H 5 Gcreate (file_id, "B", H 5 P_DEFAULT, H 5 P_DEFAULT); /* Close group and file. */ status = H 5 Gclose (group_id); status = H 5 Fclose (file_id); April 26, 2017 HDF 5 Overview @ UC Berkeley 35 www. hdfgroup. org/HDF 5

Example: Create Dataset & Group “/” (root) A B 4 x 6 array of

Example: Create Dataset & Group “/” (root) A B 4 x 6 array of integers file. h 5 April 26, 2017 HDF 5 Overview @ UC Berkeley 36 www. hdfgroup. org/HDF 5

Output of h 5 dump $ h 5 dump file. h 5 HDF 5

Output of h 5 dump $ h 5 dump file. h 5 HDF 5 "file. h 5" { GROUP "/" { DATASET "A" { DATATYPE H 5 T_STD_I 32 BE DATASPACE SIMPLE { ( 4, 6 ) / ( 4, 6 ) } DATA { (0, 0): 0, 0, 0, (1, 0): 0, 0, 0, (2, 0): 0, 0, 0, (3, 0): 0, 0, 0, 0 } } GROUP "B" { } } } HDF 5 Overview @ UC Berkeley 37 www. hdfgroup. org/HDF 5

Example Code - H 5 Dwrite int wdata[4][6]; /* Initialize the dataset. */ for

Example Code - H 5 Dwrite int wdata[4][6]; /* Initialize the dataset. */ for (i = 0; i < 4; i++) for (j = 0; j < 6; j++) wdata[i][j] = i * 6 + j + 1; …. . status = H 5 Dwrite (dataset_id, H 5 T_NATIVE_INT, H 5 S_ALL, H 5 P_DEFAULT, wdata); HDF 5 Overview @ UC Berkeley 38 www. hdfgroup. org/HDF 5

Output of h 5 dump after writing $ h 5 dump file. h 5

Output of h 5 dump after writing $ h 5 dump file. h 5 HDF 5 "file. h 5" { GROUP "/" { DATASET "A" { DATATYPE H 5 T_STD_I 32 BE DATASPACE SIMPLE { ( 4, 6 ) / ( 4, 6 ) } DATA { (0, 0): 1, 2, 3, 4, 5, 6, (1, 0): 7, 8, 9, 10, 11, 12, (2, 0): 13, 14, 15, 16, 17, 18, (3, 0): 19, 20, 21, 22, 23, 24 } } GROUP "B" { } } } HDF 5 Overview @ UC Berkeley 39 www. hdfgroup. org/HDF 5

PARTIAL I/O IN HDF 5 Overview @ UC Berkeley 40 www. hdfgroup. org/HDF 5

PARTIAL I/O IN HDF 5 Overview @ UC Berkeley 40 www. hdfgroup. org/HDF 5

How to write a row? $ h 5 dump file. h 5 HDF 5

How to write a row? $ h 5 dump file. h 5 HDF 5 "file. h 5" { GROUP "/" { DATASET "A" { DATATYPE H 5 T_STD_I 32 BE DATASPACE SIMPLE { ( 4, 6 ) / ( 4, 6 ) } DATA { (0, 0): 0, 0, 0, (1, 0): 1, 2, 3, 4, 5, 6, (2, 0): 0, 0, 0, (3, 0): 0, 0, 0, 0 } } GROUP "B" { } } } HDF 5 Overview @ UC Berkeley 41 www. hdfgroup. org/HDF 5

How to Describe a Subset in HDF 5? • Before writing and reading a

How to Describe a Subset in HDF 5? • Before writing and reading a subset of data one has to describe it to the HDF 5 Library. • HDF 5 APIs and documentation refer to a subset as a “selection” or “hyperslab selection”. • If specified, HDF 5 Library will perform I/O on a selection only and not on all elements of a dataset. HDF 5 Overview @ UC Berkeley 42 www. hdfgroup. org/HDF 5

Types of Selections in HDF 5 • Two types of selections • Hyperslab selection

Types of Selections in HDF 5 • Two types of selections • Hyperslab selection • Regular hyperslab • Simple hyperslab • Result of set operations on hyperslabs (union, difference, …) • Point selection • Hyperslab selection is especially important for doing parallel I/O in HDF 5 (See Parallel HDF 5 Tutorial) HDF 5 Overview @ UC Berkeley 43 www. hdfgroup. org/HDF 5

Regular Hyperslab Collection of regularly spaced blocks of equal size HDF 5 Overview @

Regular Hyperslab Collection of regularly spaced blocks of equal size HDF 5 Overview @ UC Berkeley 44 www. hdfgroup. org/HDF 5

Simple Hyperslab Contiguous subset or sub-array HDF 5 Overview @ UC Berkeley 45 www.

Simple Hyperslab Contiguous subset or sub-array HDF 5 Overview @ UC Berkeley 45 www. hdfgroup. org/HDF 5

Hyperslab Selection Result of union operation on three simple hyperslabs HDF 5 Overview @

Hyperslab Selection Result of union operation on three simple hyperslabs HDF 5 Overview @ UC Berkeley 46 www. hdfgroup. org/HDF 5

HDF 5 Hyperslab Description • Everything is “measured” in number of elements • Uses

HDF 5 Hyperslab Description • Everything is “measured” in number of elements • Uses row-major ordering (C order) for coordinates • Example: • • Start - starting location of a hyperslab (1, 1) Stride - number of elements that separate each block (3, 2) Count - number of blocks (2, 6) Block - block size (2, 1) 0 1 2 3 4 5 6 7 8 9 1011 HDF 5 Overview @ UC Berkeley 47 www. hdfgroup. org/HDF 5

Simple Hyperslab Description • Two ways to describe a simple hyperslab • As several

Simple Hyperslab Description • Two ways to describe a simple hyperslab • As several blocks • Stride – (1, 1) • Count – (2, 6) • Block – (2, 1) • As one block • Stride – (1, 1) • Count – (1, 1) • Block – (4, 6) No performance penalty for one way or another HDF 5 Overview @ UC Berkeley 48 www. hdfgroup. org/HDF 5

Writing a row • Memory space selection is 1 -dim array of size 6

Writing a row • Memory space selection is 1 -dim array of size 6 • File space selection start = {1, 0}, stride = {1, 1}, count = {1, 6}, block = {1, 1} Number of elements selected in memory must be the same as selected in the file HDF 5 Overview @ UC Berkeley 49 www. hdfgroup. org/HDF 5

Writing a row hid_t mspace_id, fspace_id; hsize_t dims[1] = {6}; hsize_t start[2], count[2]; ….

Writing a row hid_t mspace_id, fspace_id; hsize_t dims[1] = {6}; hsize_t start[2], count[2]; …. . /* Create memory dataspace */ mspace_id = H 5 Screate_simple(1, dims, NULL); /* Get file space identifier from the dataset */ fspace_id = H 5 Dget_space(dataset_id); /* Select hyperslab in the dataset to write too */ start[0] = 1; start[1] = 0; count[0] = 1; count[1] = 6; status = H 5 Sselect_hyperslab(fspace_id, H 5 S_SELECT_SET, start, NULL, count, NULL); H 5 Dwrite(dataset_id, H 5 T_NATIVE_INT, mspace_id, fspace_id, H 5 P_DEFAULT, wdata); HDF 5 Overview @ UC Berkeley 50 www. hdfgroup. org/HDF 5

HDF 5 FILE FORMAT HDF 5 Overview @ UC Berkeley 51 www. hdfgroup. org/HDF

HDF 5 FILE FORMAT HDF 5 Overview @ UC Berkeley 51 www. hdfgroup. org/HDF 5

HDF 5 File Format • Defined by the HDF 5 File Format Specification. http:

HDF 5 File Format • Defined by the HDF 5 File Format Specification. http: //www. hdfgroup. org/HDF 5/doc/H 5. format. html • Specifies the bit-level organization of an HDF 5 file on storage media. • HDF 5 library adheres to the File Format, users do not need to know the details of this information. Extreme Scale Computing HDF 5 52 www. hdfgroup. org/HDF 5

HDF 5 Roadmap • Concurrency • Performance • Single-Writer / Multiple. Reader (SWMR) •

HDF 5 Roadmap • Concurrency • Performance • Single-Writer / Multiple. Reader (SWMR) • Asynchronous I/O • Internal threading • • • Scalable chunk indices • Metadata aggregation and Page buffering • Variable-length records Virtual Object Layer • Fault tolerance Virtual Datasets • Parallel I/O Query & Indexing Native HDF 5 client/server • I/O Autotuning Extreme Scale Computing HDF 5 53 www. hdfgroup. org/HDF 5

The HDF Group Thank You! Questions? HDF 5 Overview @ UC Berkeley 54 www.

The HDF Group Thank You! Questions? HDF 5 Overview @ UC Berkeley 54 www. hdfgroup. org/HDF 5