Ultraspectral Sounder Data Compression Bormin Huang Allen Huang

Ultraspectral Sounder Data Compression Bormin Huang, Allen Huang, Alok Ahuja Cooperative Institute for Meteorological Satellite Studies (CIMSS) University of Wisconsin–Madison 5 th MURI Workshop Madison, WI, June 7 -9, 2005

Outline § Ultraspectral Sounder Data Compression § Current state-of-the-art Lossless Compression Schemes Ø 2 D JPEG 2000 Ø 3 D SPIHT (Set Partitioning In Hierarchical Trees) Ø 2 D CALIC (Context-based Adaptive Lossless Image Codec) Ø 2 D JPEG-LS § CIMSS’s Data Preprocessing Technique Ø Bias-Adjusted Reordering (BAR, 2004) § CIMSS’s New Lossless Compression Schemes Ø Predictive Partitioned Vector Quantization (PPVQ, 2004) Ø Fast Precomputed Vector Quantization (FPVQ, 2005) § Summary

Ultraspectral sounder data vs. Hyperspectral imager data • Imager data is used in classification, target detection and pattern recognition. Significant data loss of imager data is usually acceptable by the human visual system. • Main criterion of sounder data loss is retrieval quality. Retrieval of geophysical parameters from observed radiance is a mathematically ill-posed problem that is sensitive to error of data. • Hence, there is a need for lossless or near lossless compression of hyperspectral sounder data !!

Ultraspectral sounder data for compression studies AIRS: 2378 infrared channels, 135 scan lines x 90 cross-track footprints per granule

Wavelet based Schemes JPEG 2000 • A new ISO/IEC (International Organization for Standardization/International Electrotechnical Commission) compression standard. • Successor to the DCT (discrete cosine transform)-based JPEG algorithm.

3 D SPIHT It uses 3 D spatial hierarchical tree relationship of the wavelet transform coefficients for efficient compression (Huang et al. 2003). Parent-child interband relationship and locations for 3 D SPIHT coding Examples of allowable parent-child relations for 2 D irregular data

Predictor-Based Schemes 2 D CALIC (Context-based Adaptive Lossless Image Codec) • Among the nine proposals in the initial ISO/JPEG evaluation in July 1995, CALIC was ranked first. • It is considered the benchmark for lossless compression of continuous-tone images. i ww nn nne nw n ne w ? j Neighboring pixels used in prediction (Wu et. al. 1997) Schematic description of the CALIC encoder

2 D JPEG-LS • Published in 1999 as a lossless compression standard of the ISO/IEC. c b a x d Neighborhood of JPEG-LS used in prediction Schematic description of the JPEG-LS encoder

The Bias-Adjusted Reordering (BAR) Scheme (Huang et al. , 2004) A preprocessing technique for exploring the correlation among remote disjoint channels to improve the compression performance of the existing state-of-the-art schemes. Given the ith reordered vector , we seek and to minimize Then the (i+1)-th reordered vector is simply The optimal value of b* is obtained by which yields For lossless compression, is rounded to the nearest integer the (i+1)-th reordered vector becomes , and

PPVQ (Predictive Partitioned Vector Quantization) Linear Prediction: Each spatial frame is estimated from a linear combination of neighboring frames. Channel Partitioning: Vector Quantization: Entropy Coding

More results from Serra-Sagrista et al. (IGARSS 2005)

Fast Precomputed Vector Quantization (FPVQ) (Huang et al. 2005) • Linear Prediction: Each channel value is estimated from the linear combination of neighboring spectral channels • Bit-depth Partitioning: Channels with the same bit depth assigned to the same partition • Vector Quantization with Precomputed Codebooks: Normalized Gaussian codebooks are used for each partition. • Optimal Bit Allocation: An algorithm is presented to reduce the expected total number of bits for quantization errors. • Entropy Coding: Quantization indices and quantization errors are encoded using arithmetic coding.

Linear Prediction Each channel is estimated from a linear combination of np neighboring channels. or The prediction coefficients are given by Prediction error of each channel is close to a Gaussian distribution with a different standard deviation. Examples of Gaussian-like distributions of linear prediction errors

Vector Quantization with Precomputed Codebooks • Prediction errors of each channel are close to a Gaussian distribution with a different standard deviation. • Channels in each partition are represented as a linear combination of powers of 2. All 2 k channels within a partition form a sub-partition. • Only codebooks with 2 m codewords for 2 k-dimensional normalized Gaussian distributions are precomputed via the LBG algorithm. • The actual, data-specific Gaussian codebook is the precomputed normalized Gaussian codebook scaled by the standard deviation spectrum.

Optimal Bit Allocation (Huang et al. 2005) • Bit allocation algorithms based on marginal analysis have been proposed in literature (Riskin 1991, Cuperman 1993). • These algorithms may not guarantee an optimal solution because they terminate as soon as the constraint of their respective minimization problems are met, and thus have no chance to move further along the hyperplane of the constraint to reach a minimum solution. Bit Allocation Minimization Problem for Lossless Compression of Ultraspectal Compression subject to where is the expected total bits for the quantization errors and the quantization indices.

New Optimal Bit Allocation Algorithm • Step 1) Set • Step 2) Compute the marginal decrement • Step 3) Find indices • Step 4) Set • Step 5) Update • Step 6) Repeat Steps 3 -5 until • Step 7) Compute the next marginal decrement • Step 8) Find • Step 9) If update for which is minimum. and set and go to Step 8; else, STOP.

Example of Optimal Bit Allocation Algorithm

Lossless Compression Ratios for AIRS data

Summary • In support of the NOAA/NESDIS GOES-R HES data processing studies, we investigated/developed lossless compression of 3 D hyperspectral sounder data using wavelet-based (3 D SPIHT, JPEG 2000), predictor-based schemes (CALIC, JPEG-LS), and clustering-based schemes (PPVQ, FPVQ). • The performance rank in terms of compression ratios before our BAR scheme: JPEG-LS > 3 D SPIHT > JPEG 2000 > CALIC. • After our BAR scheme, the compression ratios of JPEG-LS, 3 D SPIHT, JPEG 2000 & CALIC are significantly improved and they all perform almost equally well ! • Our FPVQ & PPVQ schemes provide significantly higher compression ratios than existing start-of-the-art schemes on ultraspectral sounder data. Acknowledgement: This research is supported by NOAA NESDIS OSD under grant NA 07 EC 0676.