Nbit and Scale Offset filters Mu Qun Yang
N-bit and Scale. Offset filters Mu. Qun Yang National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Urbana, IL 61801 contact: ymuqun@ncsa. uiuc. edu 1
N-Bit Filter Outline ► Definition ► An usage example ► Limitations 2
N-Bit datatype A user-defined HDF 5 datatype that can use less bits than predefined datatype. Illustration of an N-Bit datatype on a little-endian machine base type: integer offset: 4 precision: 16 | byte 3 | byte 2 | byte 1 | byte 0 | |? ? ? ? |? ? SPPP|PPPP|PPPP? ? | S - sign bit P - significant bit ? - padding bit Without using N-bit filter, HDF 5 saves the padding bits of N-bit datatype, no disk space is saved. 3
N-bit filter ►When using N-bit filter for N-bit datatype, all padding bits will be chopped off during compression, and will be stored on disk like: |SPPPPPPP|PPPPPPPP| 4
Enable N-Bit filter ► Create a dataset creation property list ► Set chunking (and specify chunk dimensions) ► Set up use of the N-Bit filter ► Create dataset specifying this property list ► Close property list 5
N-Bit filter: usage example /*Define dataset datatype (N-Bit), and set precision, offset */ datatype = H 5 Tcopy(H 5 T_NATIVE_INT); precision = 17; H 5 Tset_precision(datatype, precision); offset = 4; if(H 5 Tset_offset(datatype, offset); /* Set the dataset creation property list for N-Bit */ chunk_size[0] = CH_NX; chunk_size[1] = CH_NY; properties = H 5 Pcreate (H 5 P_DATASET_CREATE); H 5 Pset_chunk (properties, 2, chunk_size); H 5 Pset_nbit (properties); /* Create a new dataset with N-Bit datatype */ dataset = H 5 Dcreate (file, DATASET_NAME, datatype, dataspace, properties); 6
N-bit filter restrictions ► Only compresses N-Bit datatype or field derived from integer or floating-point ► Assumes padding bits of zero ► No handling if fill value datatype differs from dataset datatype 7
Scale. Offset Filter Outline ► Definition ► How does Scale. Offset filter work ► Why uses Scale. Offset filter ► Usage examples ► Performance with EOS data ► Limitations 8
Scale. Offset filter ► Scale. Offset compression performs a scale Scale. Offset compression performs and/or offset operation on each data value and truncates the resulting value to a minimum truncates the resulting value to a number of bits before storing it. The datatype is either integer or floating-point. ► offset in Scale-Offset compression means the minimum value of a set of data values ► If a fill value is defined for the dataset, the filter will ignore it when finding the minimum value 9
How Scale. Offset works An example for Integer Type ► Maximum is 7065; Minimum is 2970 The "span" = Max-Min+1 = 4096 ► Case 1: No fill value is defined. Minimum number of bits per element to store = ceiling(log 2(span)) = 12 ► Case 2: Fill value is defined in this array. Minimum number of bits per element to store = ceiling(log 2(span+1)) = 13 10
How Scale. Offset works An example for Integer Type (cont. ) ► Compression: 1. Subtract minimum value from each element 2. Pack all data with minimum number of bits ► Decompression: 1. Unpack all data 2. Add minimum value to each element ► Outcome: 1. Save about 60% disk space for this case 11
How Scale. Offset works An example for Floating-point Type ► D-scaling factor: the number of decimal precision to keep for the filter output ► Floating-point data: {104. 561, 99. 459, 100. 545, 105. 644} ► D-scaling factor: 2 ► Preparation for Compression 1. Calculate the minimum value = 99. 459 2. Subtract the minimum value Modified data: {5. 102, 0, 1. 086, 6. 185} 3. Scale the data by multiplying 10 ^ D-scaling factor = 100 Modified data: {510. 2, 0, 108. 6, 618. 5} 4. Round the data to integer Modified data: {510 , 0, 109, 619} 12
How Scale. Offset works An example for Floating-point Type (cont. ) ► Compression and Decompression: using Scale. Offset for integer ► Restoration after decompression 1. Divide each value by 10^ D-scaling factor 2. Add the offset 99. 459 3. The floating point data {104. 559, 99. 459, 100. 549, 105. 649} ► Outcome: 1. Lossy compression 2. Compression ratio will depend on D-scaling factor 13
Scale-Offset filter compresses floating-point data ► GRi. B data packing method ► The Scale-Offset compression of floating-point data is lossy ► Two scaling methods: D-scaling E-scaling Currently only D-scaling is implemented 14
Why Scale. Offset Filter? ► Internal HDF 5 filter ► Easy to understand § Integer: lossless (by default) § Floating-point: GRIB lossy compression ► Easy to control floating compression § D-scaling factor ► Easy to estimate the compression ratio 15
H 5 Pset_scaleoffset API H 5 Pset_scaleoffset (hid_t plist_id, H 5 Z_SO_scale_type_t scale_type, int scale_factor) ► plist_id IN: Dataset creation property list identifier ► scale_type IN: Flag indicating compression method § H 5 Z_SO_FLOAT_DSCALE (0) Floating-point type § H 5 Z_SO_INT (2) Integer type ► scale_factor IN: Flag indicating compression method § If scale_type is H 5 Z_SO_FLOAT_DSCALE, decimal scale factor § If scale_type is H 5 Z_SO_INT, scale_factor denotes minimum-bits, should be a positive integer or H 5 Z_SO_INT_MINBITS_DEFAULT 16
Integer example /* Set the fill value of dataset */ fill_val = 10000; H 5 Pset_fill_value(properties, H 5 T_NATIVE_INT, &fill_val); /* Set parameters for Scale-Offset compression*/ H 5 Pset_scaleoffset (properties, H 5 Z_SO_INT_MINBITS_DEFAULT); /* Create a new dataset */ dataset = H 5 Dcreate (file, DATASET_NAME, H 5 T_NATIVE_INT, dataspace, properties); 17
Floating-point example fill_val = 10000. 0; /* Set the fill value of dataset */ H 5 Pset_fill_value(properties, H 5 T_NATIVE_FLOAT, &fill_val); /* Set parameters for Scale-Offset compression; use D-scaling method, set decimal scale factor to 3 */ H 5 Pset_scaleoffset (properties, H 5 Z_SO_FLOAT_DSCALE, 3); /* Create a new dataset */ dataset = H 5 Dcreate (file, DATASET_NAME, H 5 T_NATIVE_FLOAT, dataspace, properties); 18
19
Limitations § Compressed floating-point data range is limited by the size of corresponding unsigned integer type. § Long double is not supported. 20
Thank you This work was funded by the NASA Earth Science Technology Office under NASA award AIST-02 -0071 and UCAR Subaward No S 03 -38820. Other support was provided based upon the Cooperative Agreement with the National Aeronautics and Space Administration (NASA) under NASA grant NNG 05 GC 60 A. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of NASA. 21
- Slides: 21