The HDF Group HDF 5 Filters Using filters

  • Slides: 35
Download presentation
The HDF Group HDF 5 Filters Using filters and compression in HDF 5 May

The HDF Group HDF 5 Filters Using filters and compression in HDF 5 May 30 -31, 2012 HDF 5 Workshop at PSI 1 www. hdfgroup. org

Outline • • • Introduction to HDF 5 filters Other filters and how to

Outline • • • Introduction to HDF 5 filters Other filters and how to find them How to add your own filter Future work May 30 -31, 2012 HDF 5 Workshop at PSI 2 www. hdfgroup. org

INTRODUCTION TO HDF 5 FILTERS May 30 -31, 2012 HDF 5 Workshop at PSI

INTRODUCTION TO HDF 5 FILTERS May 30 -31, 2012 HDF 5 Workshop at PSI 3 www. hdfgroup. org

What is an HDF 5 filter? • Data transformation performed by the HDF 5

What is an HDF 5 filter? • Data transformation performed by the HDF 5 library during I/O operations • HDF 5 filters (or built-in filters) • Supported by The HDF Group • Come with the HDF 5 library source code • User-defined filters • Filters written by HDF 5 users and/or available with some applications (h 5 py, Py. Tables) • May be or may not be registered with The HDF Group May 30 -31, 2012 HDF 5 Workshop at PSI 4 www. hdfgroup. org

HDF 5 filters • Filters are arranged in a pipeline so the output of

HDF 5 filters • Filters are arranged in a pipeline so the output of one filter becomes the input of the next filter • The filter pipeline can be only applied to - Chunked dataset - HDF 5 library passes each chunk through the filter pipeline on the way to or from disk - Group - Link names are stored in a local heap, which may be compressed with a filter pipeline May 30 -31, 2012 HDF 5 Workshop at PSI 5 www. hdfgroup. org

Filter pipeline Application memory space XYZ Chunked dataset A C Group heap Chunk cache

Filter pipeline Application memory space XYZ Chunked dataset A C Group heap Chunk cache C B Filter pipeline File B A Group heap …………. . C Filters are applied in a user-specified order when the HDF 5 library performs I/O operations on a chunk or on a group heap May 30 -31, 2012 HDF 5 Workshop at PSI 6 www. hdfgroup. org

Filter pipeline programming model • Operations on the HDF 5 filter pipeline http: //www.

Filter pipeline programming model • Operations on the HDF 5 filter pipeline http: //www. hdfgroup. org/HDF 5/doc 1. 6/Filters. html • Defining a pipeline - Use a sequence of the H 5 Pset_filter calls or predefined API , e. g. , H 5 Pset_deflate, on a dataset or group creation property to create a pipeline - On write, the filters are applied in the order they were specified - On read, the filters are applied in the reverse order they were specified (last one in the pipeline is applied first) - It is the user’s responsibility to create a meaningful pipeline May 30 -31, 2012 HDF 5 Workshop at PSI 7 www. hdfgroup. org

Filter pipeline programming model • Operations on the HDF 5 filter pipeline • Query

Filter pipeline programming model • Operations on the HDF 5 filter pipeline • Query - Number of filters in a pipeline - H 5 Pget_nfilters - Information about a filter using filter identifier - H 5 Pget_filter_by_id - Check if a filter is available in the library - H 5 Zfilter_avail • Modify - Change properties of existing filter - H 5 Pmodify_filter - Remove filter from pipeline - H 5 Premove_filter May 30 -31, 2012 HDF 5 Workshop at PSI 8 www. hdfgroup. org

Filter pipeline programming model • Filter pipeline is permanent for dataset or a group

Filter pipeline programming model • Filter pipeline is permanent for dataset or a group • Filters are part of an HDF 5 object (group or dataset) creation property • The object’s filter pipeline cannot be modified after the object has been created May 30 -31, 2012 HDF 5 Workshop at PSI 9 www. hdfgroup. org

Applying filters to a dataset dcpl_id = H 5 Pcreate(H 5 P_DATASET_CREATE); cdims[0] =

Applying filters to a dataset dcpl_id = H 5 Pcreate(H 5 P_DATASET_CREATE); cdims[0] = 100; cdims[1] = 100; H 5 Pset_chunk(dcpl_id, 2, cdims); H 5 Pset_shuffle(dcpl); H 5 Pset_deflate(dcpl_id, 9); dset_id = H 5 Dcreate (…, dcpl_id); H 5 Pclose(dcpl_id); May 30 -31, 2012 HDF 5 Workshop at PSI 10 www. hdfgroup. org

Applying filters to a group gcpl_id = H 5 Pcreate(H 5 P_GROUP_CREATE); H 5

Applying filters to a group gcpl_id = H 5 Pcreate(H 5 P_GROUP_CREATE); H 5 Pset_deflate(dcpl_id, 9); group_id = H 5 Gcreate (…, gcpl_id, …); H 5 Pclose(gcpl_id); May 30 -31, 2012 HDF 5 Workshop at PSI 11 www. hdfgroup. org

HDF 5 FILTERS May 30 -31, 2012 HDF 5 Workshop at PSI 12 www.

HDF 5 FILTERS May 30 -31, 2012 HDF 5 Workshop at PSI 12 www. hdfgroup. org

Types of HDF 5 Filters • • Algebraic data transformation Data shuffling Checksum Data

Types of HDF 5 Filters • • Algebraic data transformation Data shuffling Checksum Data compression - Scale + offset N-bit GZIP (deflate) SZIP May 30 -31, 2012 HDF 5 Workshop at PSI 13 www. hdfgroup. org

Checking available HDF 5 Filters • Use API (H 5 Zfilter_avail) • Check libhdf

Checking available HDF 5 Filters • Use API (H 5 Zfilter_avail) • Check libhdf 5. settings file Features: Parallel HDF 5: no ………………………. I/O filters (external): deflate(zlib), szip(encoder) I/O filters (internal): shuffle, fletcher 32, nbit, scaleoffset ………………………. May 30 -31, 2012 HDF 5 Workshop at PSI 14 www. hdfgroup. org

External HDF 5 Filters • External HDF 5 filters rely on the third-party libraries

External HDF 5 Filters • External HDF 5 filters rely on the third-party libraries installed on the system • GZIP • By default HDF 5 configure uses ZLIB installed on the system • Configure will proceed if ZLIB is not found on the system • SZIP (added by NASA request) • Optional; have to be configured in using –withszlib=/path…. • Configure will proceed if SZIP is not found • Comes with a license http: //www. hdfgroup. org/doc_resource/SZIP/Comme rcial_szip. html • Decoder is free; for encoder see the license terms May 30 -31, 2012 HDF 5 Workshop at PSI 15 www. hdfgroup. org

Internal HDF 5 Filters • Internal filters are implemented by The HDF Group and

Internal HDF 5 Filters • Internal filters are implemented by The HDF Group and come with the library • HDF 5 internal filters can be configured out using –disable-filters=“filter 1, filter 2, . . ” • • FLETCHER 32 SHUFFLE SCALEOFFSET NBIT May 30 -31, 2012 HDF 5 Workshop at PSI 16 www. hdfgroup. org

Checksum filter • Predefined HDF 5 filter (H 5 Pset_fletcher 32) • Why: •

Checksum filter • Predefined HDF 5 filter (H 5 Pset_fletcher 32) • Why: • Error detection for raw data • What: • Implements Fletcher 32 checksum algorithm Memory File Checksum value May 30 -31, 2012 HDF 5 Workshop at PSI 17 www. hdfgroup. org

Shuffling filter • Predefined HDF 5 filter (H 5 Pset_shuffle) • Why: • Better

Shuffling filter • Predefined HDF 5 filter (H 5 Pset_shuffle) • Why: • Better compression of unused bytes • What: • Changes byte order in a stream of data 00 00 00 01 00 00 00 17 00 00 00 2 B 00 00 00 01 17 2 B May 30 -31, 2012 HDF 5 Workshop at PSI 18 www. hdfgroup. org

Effect of data shuffling • H 5 Pset_shuffle followed by H 5 Pset_deflate •

Effect of data shuffling • H 5 Pset_shuffle followed by H 5 Pset_deflate • Write 4 -byte integer dataset 256 x 1024 (256 MB) • Using chunks of 256 x 1024 (16 MB) • Values: random integers between 0 and 255 File size Total time Write Time No Shuffle 102. 9 MB 671. 049 629. 45 Shuffle 67. 34 MB 83. 353 78. 268 May 30 -31, 2012 19 HDF 5 Workshop at PSI www. hdfgroup. org

N-bit compression filter • Predefined HDF 5 filter (H 5 Pset_nbit) • Why: Compact

N-bit compression filter • Predefined HDF 5 filter (H 5 Pset_nbit) • Why: Compact storage for user-defined datatypes • What: • When data stored on disk, padding bits chopped off and only significant bits stored • Supports most datatypes • Works with compound datatypes May 30 -31, 2012 HDF 5 Workshop at PSI 20 www. hdfgroup. org

N-bit compression example • In memory, one value of N-Bit datatype is stored like

N-bit compression example • In memory, one value of N-Bit datatype is stored like this: | byte 3 | byte 2 | byte 1 | byte 0 | |? ? ? ? |? ? SPPP|PPPP|PPPP? ? | S-sign bit P-significant bit ? -padding bit • After passing through the N-Bit filter, all padding bits are chopped off, and the bits are stored on disk like this: | 1 st value | 2 nd value | |SPPPPPPPP|. . . • Opposite (decompress) when going from disk to memory May 30 -31, 2012 HDF 5 Workshop at PSI 21 www. hdfgroup. org

“Scale+offset” filter • Predefined HDF 5 filter (H 5 Pset_scaleoffset) • Why: • Use

“Scale+offset” filter • Predefined HDF 5 filter (H 5 Pset_scaleoffset) • Why: • Use less storage when less precision needed • What: • Performs scale/offset operation on each value • Truncates result to fewer bits before storing • Currently supports integers and floats May 30 -31, 2012 HDF 5 Workshop at PSI 22 www. hdfgroup. org

Example with floating-point type • Data: {104. 561, 99. 459, 100. 545, 105. 644}

Example with floating-point type • Data: {104. 561, 99. 459, 100. 545, 105. 644} • Choose scaling factor: decimal precision to keep E. g. scale factor D = 2 1. Find minimum value (offset): 99. 459 2. Subtract minimum value from each element Result: {5. 102, 0, 1. 086, 6. 185} 3. Scale data by multiplying 10 D = 100 Result: {510. 2, 0, 108. 6, 618. 5} 4. Round the data to integer Result: {510 , 0, 109, 619} 5. Pack and store using min number of bits May 30 -31, 2012 HDF 5 Workshop at PSI 23 www. hdfgroup. org

THIRD PARTY HDF 5 FILTERS May 30 -31, 2012 HDF 5 Workshop at PSI

THIRD PARTY HDF 5 FILTERS May 30 -31, 2012 HDF 5 Workshop at PSI 24 www. hdfgroup. org

Third-party HDF 5 filters • Compression methods supported by HDF 5 user community http:

Third-party HDF 5 filters • Compression methods supported by HDF 5 user community http: //www. hdfgroup. org/services/contributions - LZO, BZIP 2, BLOSC (Py. Tables) - LZF (h 5 py) - MAFISC - May 30 -31, 2012 The Website has a patch for external module loader HDF 5 Workshop at PSI 25 www. hdfgroup. org

HOW TO ADD YOUR OWN FILTER May 30 -31, 2012 HDF 5 Workshop at

HOW TO ADD YOUR OWN FILTER May 30 -31, 2012 HDF 5 Workshop at PSI 26 www. hdfgroup. org

Filter design considerations • A filter is bidirectional - Handles both input and output

Filter design considerations • A filter is bidirectional - Handles both input and output to the file - A flag is passed to the filter to indicate the direction • The filter - May 30 -31, 2012 Reads data from a buffer Performs transformation on the data Places the result in the same or new buffer Returns the buffer pointer and size to the caller Returns zero to indicate a failure HDF 5 Workshop at PSI 27 www. hdfgroup. org

How to proceed? • Implement a filter (See H 5 Zregister in RM) •

How to proceed? • Implement a filter (See H 5 Zregister in RM) • See H 5 Zdeflate. c in the HDF 5 src directory for ideas • Application will need to • Register filter with the HDF 5 library using H 5 Zregister • Add filter to pipeline using H 5 Pset_filter • Follow the HDF 5 programming model as usual May 30 -31, 2012 HDF 5 Workshop at PSI 28 www. hdfgroup. org

Example: Adding BZIP 2 compression • Source: h 5 ex_d_bzip 2. c h 5

Example: Adding BZIP 2 compression • Source: h 5 ex_d_bzip 2. c h 5 bzip 2. h H 5 Zbzip 2. c • Compile %h 5 cc h 5 ex_d_bzip 2. c H 5 Zbzip 2. c –lbz 2 May 30 -31, 2012 HDF 5 Workshop at PSI 29 www. hdfgroup. org

How to register new filter with us? • Send request to help@hdfgroup. org •

How to register new filter with us? • Send request to help@hdfgroup. org • Provide • Filter information • Maintainer contact information • Get filter unique identifier • Filter info will be available http: //www. hdfgroup. org/services/contributions. html May 30 -31, 2012 HDF 5 Workshop at PSI 30 www. hdfgroup. org

Example: h 5 dump output on BZIP 2 data HDF 5 "h 5 ex_d_bzip

Example: h 5 dump output on BZIP 2 data HDF 5 "h 5 ex_d_bzip 2. h 5" { GROUP "/" { DATASET "DS-bzip 2" { . . . } FILTERS { UNKNOWN_FILTER { FILTER_ID 305 COMMENT bzip 2 PARAMS { 9 } } . . . } DATA {h 5 dump error: unable to print data } May 30 -31, 2012 HDF 5 Workshop at PSI 31 www. hdfgroup. org

Problem with using custom filter • “Off the shelf” HDF 5 tools do not

Problem with using custom filter • “Off the shelf” HDF 5 tools do not work with the third-party filters • h 5 dump, MATLAB and IDL, etc. • Solution • Modify HDF 5 source with your code • Use a patch from http: //wr. informatik. unihamburg. de/research/projects/icomex/mafisc May 30 -31, 2012 HDF 5 Workshop at PSI 32 www. hdfgroup. org

FUTURE IMPROVEMENTS May 30 -31, 2012 HDF 5 Workshop at PSI 33 www. hdfgroup.

FUTURE IMPROVEMENTS May 30 -31, 2012 HDF 5 Workshop at PSI 33 www. hdfgroup. org

Proposal in works • Modify the HDF 5 file format and library that allows

Proposal in works • Modify the HDF 5 file format and library that allows a dynamic library to be loaded for performing filter operations • Challenges: • Portable solution between UNIX and Windows is required • Increased maintenance cost • Testing • Code maintenance • Documentation May 30 -31, 2012 HDF 5 Workshop at PSI 34 www. hdfgroup. org

The HDF Group Thank You! Questions? May 30 -31, 2012 HDF 5 Workshop at

The HDF Group Thank You! Questions? May 30 -31, 2012 HDF 5 Workshop at PSI 35 www. hdfgroup. org