Adaptive Compression to Improve IO Performance for Climate

Scientific Data Compression? • Data reduction is growing concern for scientific computing • Motivating

Scientific Data Compression • Scientific data often are multidimensional arrays of floating point numbers,

Scientific Data Compression • Which compression method to use for the given data? •

Best compression method differs for different variables Results from a single WRF output file

ACOMPS: Adaptive Compression Scheme • An Adaptive compression tool that • Supports a set

ACOMPS: Adaptive Compression Scheme Preprocessing techniques supported *** Grid cell 1 Grid cell 2

ACOMPS : Adaptive Compression Scheme Criteria to evaluate the performance of any compression technique

WRF-LETKF based climate simulations Merge with the initial conditions to guide the simulation initial

Experimental Setup • Deepthought 2 Campus cluster at UMD Results Number of nodes :

Adaptive Vs Non-adaptive methods : Output sizes OUTPUT SIZES (IN GIGABYTES) 77% improvement in

Adaptive Vs Non-adaptive methods : Compression time Best compression method differs for different variables

Future Directions • Extend to support more compression methods including both lossless and lossy

Adaptive Compression to Improve I/O Performance for Climate Simulations Swati Singhal swati@cs. umd. edu

Related Work Lossless Compression • E. R. Schendel, Y. Jin, N. Shah, J. Chen,

Slides: 18

Download presentation

Adaptive Compression to Improve I/O Performance for Climate Simulations Swati Singhal Alan Sussman UMIACS and Department of Computer Science The 2 nd International Workshop on Data Reduction for Big Scientific Data 1

Scientific Data Compression? • Data reduction is growing concern for scientific computing • Motivating Example: Meso-Scale Climate Simulation Application from Department of Atmospheric and Oceanic Sciences at UMD • An ensemble simulation with data assimilation on 1. 3 million grid points that forecasts for a nine hour period, repeatedly • Simulation time: ~65 mins on a cluster for a 9 hour forecast (~60% of the time is spend in I/O and extraneous work) • Generates ~283 GB of data per simulation • Possible solution • Data compression • Reduces data volume for I/O and increases effective I/O bandwidth

Scientific Data Compression • Scientific data often are multidimensional arrays of floating point numbers, stored in self-describing data formats (e. g. , net. CDF, HDF, etc. ) • Difficult to compress -> high entropy in lower order bytes 0. 00589 0. 00590 00111011 1100000 10000 11100111011 1100000 10100 11001010 Hard to achieve good compression ratio. • What methods are available to compress scientific data? • Two categories of data compression methods Lossy methods provide high compression but • Lossy compression precision is lost E. g. , ZFP (Linstrom), SZ (Di), ISABELA (Lakshminarasimhan) Lossless methods retain precision but are not sufficient to achieve high compression • Lossless compression • E. g. , ZLIB, LZO, BZIP 2, FPC (Burtscher), - require preprocessing techniques ISOBAR (Schendel)

Scientific Data Compression • Which compression method to use for the given data? • Typically one-time offline analysis on small subset of data • Criteria is based on either compression ratio or compression speed depending on application needs • One compression method for all the data variables ISSUES? • Manual effort required to select a compression scheme • Limited measurements to define performance criteria • Loss of compression benefits -> best compression method differs for different variables (and may also change for same variable over time) Can we do better?

Best compression method differs for different variables Results from a single WRF output file o Co io s s re mp ti a r n COMPRESSION RATIO T SR F on s i s res MB/ COMPRESSION SPEED (MB/S) p n m Co eed i Sp T SR F 10000. 00 1000. 00 10 10. 00 1 Log scale Prep 1 + Prep 2 + Prep 3 + LZO BZIP 2 LZO BZIP 2 Log scale

ACOMPS: Adaptive Compression Scheme • An Adaptive compression tool that • Supports a set of lossless compression methods combined with different memory preprocessing techniques • Automatically selects the best compression method for each variable in the dataset • Allows flexible criteria to select the best compression method • Allows compressing data in smaller units/chunks for selective decompression to increase effective I/O bandwidth

ACOMPS: Adaptive Compression Scheme Preprocessing techniques supported *** Grid cell 1 Grid cell 2 ----------- Lossless compression methods supported *** Total 9 compression techniques Bytes segregation (B) Identify and segregate compressible bytes(based on skewness) for compression LZO ----------- ZLIB Byte-Wise segregation (BW) Segregate compressible bytes and group these bytes based on their position in the floating point number. --------- First compressible byte of all grid cells BZIP 2 -----Second compressible byte of all grid cells Byte-Wise segregation and XOR (BWXOR) Byte-Wise segregation + XOR ------------------ B - LZO B - ZLIB B - BZIP 2 BW - LZO BW - ZLIB BW - BZIP 2 BWXOR - LZO BWXOR - ZLIB BWXOR - BZIP 2 ***Other preprocessing and compression(both Lossy and Lossless) techniques can be added

ACOMPS : Adaptive Compression Scheme Criteria to evaluate the performance of any compression technique X performancex = compression_speedx * WCS + compression_ratiox * WCR User tunable parameters: WCR => compression ratio weighting for deciding best compression method. WCS => compression speed weighting for deciding best compression method. △ => small delta limit to define acceptance range. Variable A to be compressed at time step 0 Determine the best technique, Tx. Record Best. A = Tx Best. Perf. A = performance. Tx Compress the data using technique Best. A and record latest. Perf. A = performance. Best. A

ACOMPS : Adaptive Compression Scheme Criteria to evaluate the performance of any compression technique X performancex = compression_speedx * WCS + compression_ratiox * WCR User tunable parameters: WCR => compression ratio weighting for deciding best compression method. WCS => compression speed weighting for deciding best compression method. △ => small delta limit to define acceptance range. Variable A to be compressed at time step t Best. Perf. A - △ < latest. Perf. A < Best. Perf. A + △ No Determine the best technique, Tx. Record Best. A = Tx Best. Perf. A = performance. Tx Compress the data using technique Best. A and record latest. Perf. A = performance. Best. A YES Compression performance didn’t change beyond the limit △. Continue to use the current Best. A

WRF-LETKF based climate simulations Merge with the initial conditions to guide the simulation initial conditions convert WRF n n parallel WRF ensembles net. CDF Binary OBSERV ATIONS LETKFm LETKF m m parallel LETKF processes Binary Revised state output based on the observed data

WRF-LETKF based climate simulations Merge with the initial conditions to guide the simulation initial conditions convert WRF n net. CDF Binary OBSERV ATIONS n parallel WRF ensembles LETKFm LETKF m m parallel LETKF processes Binary Revised state output based on the observed data Example: n = 55, m = 400. Max grid size : 181 x 151 x 51 Single cycle WRF + single cycle LETKF (9 hours simulation time) Total simulation time (single cycle WRF + single cycle LETKF) : ~ 65 mins on cluster High conversion cost (9 x 55) files => 36. 7 minutes => ~56% of the total simulation time Large output data size : ~283 GB DRBSD-2

WRF-LETKF based climate simulations Merge with the initial conditions to guide the simulation initial conditions convert WRF n net. CDF n parallel WRF ensembles WRF ADIOS I/O Plugin LETKFm LETKF m Binary OBSERV ATIONS m parallel LETKF processes Any format supported by ADIOS. No conversion required ADIOS + ACOMPS data transformation plugin LETKF ADIOS I/O Plugin Binary Revised state output based on the observed data

Experimental Setup • Deepthought 2 Campus cluster at UMD Results Number of nodes : 484 with 20 cores/node + 4 nodes with 40 core/node Memory/node ~ 128 GB (DDR 3 at 1866 Mhz) Processor : dual Intel Ivy Bridge E 5 -2680 v 2 at 2. 8 GHz Parallel File system : Lustre • Climate simulations with WRF-LETKF Domain size : 181 x 151 grid cells Vertical levels : 51 Majority of variables are float type 3 D variable : XLAT, XLONG, F, T… 4 D variables: U, V, W, P, PB, RAINC. . … WRF Ensemble n = 55 => each uses 1 node No. of MPI processes = (55 x 20) = 1100 LETKF => uses 20 nodes No. of MPI processes = (20 x 20) = 400

Adaptive Vs Non-adaptive methods : Output sizes OUTPUT SIZES (IN GIGABYTES) 77% improvement in size over original 283 191 net. CDF + Binary ADIOS ACOMPS achieves better compression 13% better than ADIOS + Bzip 2 73 72. 6 80. 03 ADIOS + Zlib ADIOS + Bzip 2 ADIOS + LZO 67. 5 62. 8 ADIOS + ACOMPS (Only CS) ADIOS + ACOMPS (Only CR) Only CR (Best compression ratio, slower) => WCR = 1, Wcs = 0 Only CS (Best speed, not as good compression) => WCR = 0, Wcs = 1

Adaptive Vs Non-adaptive methods : Compression time Best compression method differs for different variables END to END TIMES LETKF 70. 00 conversion WRF preprocessing TIME IN MINUTES WRF 60. 00 50. 00 Close to fastest ADIOS + LZO 40. 00 Lower overhead than ADIOS + Bzip 2 with much better compression 30. 00 20. 00 10. 00 ACOMPS incurs low overhead net. CDF + Binary ADIOS + Zlib ADIOS + Bzip 2 ADIOS + LZO ADIOS + ACOMPS (Only CS) (Only CR) Only CR(Best compression ratio, slower) => WCR = 1, Wcs = 0 Only CS(Best speed, not as good compression) => WCR = 0, Wcs = 1

Future Directions • Extend to support more compression methods including both lossless and lossy compression methods • Thoroughly analyze how the best compression method for a given variable changes over time • How often it is advantageous to do the re-analysis? • How to enhance the criteria to decide when to re-evaluate in order to adapt to the changes quickly • Parallelize the analysis phase using threads

Adaptive Compression to Improve I/O Performance for Climate Simulations Swati Singhal swati@cs. umd. edu Alan Sussman als@cs. umd. edu UMIACS and Department of Computer Science The 2 nd International Workshop on Data Reduction for Big Scientific Data 17

Related Work Lossless Compression • E. R. Schendel, Y. Jin, N. Shah, J. Chen, C. Chang, S. -H. Ku, S. Ethier, S. Klasky, R. Latham, R. Ross, and N. F. Samatova, “ISOBAR precon- ditioner for effective and high-throughput lossless data compression, ” (ICDE. 2012) • M. Burtscher and P. Ratanaworabhan, “FPC: A high-speed compressor for double-precision floating-point data, ” IEEE Transactions on Com- puters, 2009 • Martin Burtscher and Paruj Ratanaworabhan. g. FPC: A Self-Tuning Compression Algorithm. In Data Compression Conference (DCC), 2010 • I. H. Witten, R. M. Neal, and J. G. Cleary, “Arithmetic coding for data compression, ” Communications of the ACM 1987. • S. Bhattacherjee, A. Deshpande, and A. Sussman, “PStore: An efficient storage framework for managing scientific data (SSDBM 2014) • S. W. Son, Z. Chen, W. Hendrix, A. Agrawal, W. keng Liao, and A. Choudhary, “Data compression for the exascale computing era – survey, ” Journal of Supercomputing Frontiers and Innovations, 2014. Lossy Compression • Sheng Di and Franck Cappello. Fast Error-Bounded Lossy HPC Data Compression with SZ. IPDPS, 2016 • Woody Austin, Grey Ballard, and Tamara G. Kolda. Parallel tensor compression for large-scale scientific data. IPDPS, 2016 • Martin Burtscher, Hari Mukka, Annie Yang, and Farbod Hesaaraki. Real-time synthesis of compression algorithms for scientific data. High Performance Computing, Networking, Storage and Analysis, SC 16, 2016