The HDF Group Parallel HDF 5 Design and
- Slides: 58
The HDF Group Parallel HDF 5 Design and Programming Model May 30 -31, 2012 HDF 5 Workshop at PSI 1 www. hdfgroup. org
Advantage of parallel HDF 5 • Recent success story • • Trillion particle simulation 120, 000 cores 30 TB file 23 GB/sec average speed with 35 GB/sec peaks (out of 40 GB/sec max for system) • Parallel HDF 5 rocks! (when used properly ) May 30 -31, 2012 HDF 5 Workshop at PSI 2 www. hdfgroup. org
Outline • • • Overview of Parallel HDF 5 design Parallel Environment Requirements Performance Analysis Parallel Tools PHDF 5 Programming Model Examples May 30 -31, 2012 HDF 5 Workshop at PSI 3 www. hdfgroup. org
OVERVIEW OF PARALLEL HDF 5 DESIGN May 30 -31, 2012 HDF 5 Workshop at PSI 4 www. hdfgroup. org
PHDF 5 requirements • PHDF 5 should allow multiple processes to perform I/O to an HDF 5 file at the same time • Single file image to all processes • Compare with one file per process design: • Expensive post processing • Not usable by different number of processes • Too many files produced for file system • PHDF 5 should use a standard parallel I/O interface • Must be portable to different platforms May 30 -31, 2012 HDF 5 Workshop at PSI 5 www. hdfgroup. org
PHDF 5 requirements • Support Message Passing Interface (MPI) programming • PHDF 5 files compatible with serial HDF 5 files • Shareable between different serial or parallel platforms May 30 -31, 2012 HDF 5 Workshop at PSI 6 www. hdfgroup. org
Parallel environment requirements • MPI with POSIX I/O (HDF 5 MPI POSIX driver) • POSIX compliant file system • MPI with MPI-IO (HDF 5 MPI I/O driver) • MPICH 2 ROMIO • Vendor’s MPI-IO • Parallel file system • GPFS (General Parallel File System) • Lustre May 30 -31, 2012 HDF 5 Workshop at PSI 7 www. hdfgroup. org
PHDF 5 implementation layers HDF 5 Application Compute node HDF 5 I/O Library MPI I/O Library HDF 5 file on Parallel File System Switch network/I/O servers Disk architecture and layout of data on disk May 30 -31, 2012 HDF 5 Workshop at PSI 8 www. hdfgroup. org
PHDF 5 CONSISTENCY SEMANTICS May 30 -31, 2012 HDF 5 Workshop at PSI 9 www. hdfgroup. org
Consistency semantics • Consistency semantics are rules that define the outcome of multiple, possibly concurrent, accesses to an object or data structure by one or more processes in a computer system. May 30 -31, 2012 HDF 5 Workshop at PSI 10 www. hdfgroup. org
PHDF 5 consistency semantics • PHDF 5 library defines a set of consistency semantics to let users know what to expect when processes access data managed by the library. • When the changes a process makes are actually visible to itself (if it tries to read back that data) or to other processes that access the same file with independent or collective I/O operations • Consistency semantics varies depending on a driver used • MPI-POSIX • MPI I/O May 30 -31, 2012 HDF 5 Workshop at PSI 11 www. hdfgroup. org
HDF 5 MPI-POSIX consistency semantics • Same as POSIX semantics Process 0 write() MPI_Barrier() Process 1 MPI_Barrier() read() • POSIX I/O guarantees that Process 1 will read what Process 0 has written: the atomicity of the read and write calls and the synchronization using the barrier ensures that Process 1 will call the read function after Process 0 is finished with the write function. May 30 -31, 2012 HDF 5 Workshop at PSI 12 www. hdfgroup. org
HDF 5 MPI-I/O consistency semantics • Same as MPI-I/O semantics Process 0 MPI_File_write_at() MPI_Barrier() Process 1 MPI_Barrier() MPI_File_read_at() • Default MPI-I/O semantics doesn’t guarantee atomicity or sequence of calls! • Problems may occur (although we haven’t seen any) when writing/reading HDF 5 metadata or raw data May 30 -31, 2012 HDF 5 Workshop at PSI 13 www. hdfgroup. org
HDF 5 MPI-I/O consistency semantics • MPI I/O provides atomicity and sync-barriersync features to address the issue • PHDF 5 follows MPI I/O • H 5 Fset_mpio_atomicity function to turn on MPI atomicity • H 5 Fsync function to transfer written data to storage device (in implementation now) • We are currently working on reimplementation of metadata cache for PHDF 5 (metadata server) May 30 -31, 2012 HDF 5 Workshop at PSI 14 www. hdfgroup. org
HDF 5 MPI-I/O consistency semantics • For more information see “Enabling a strict consistency semantics model in parallel HDF 5” linked from H 5 Fset_mpi_atomicity RM page May 30 -31, 2012 HDF 5 Workshop at PSI 15 www. hdfgroup. org
MPI-I/O VS. PHDF 5 May 30 -31, 2012 HDF 5 Workshop at PSI 16 www. hdfgroup. org
MPI-IO vs. HDF 5 • MPI-IO is an Input/Output API • It treats the data file as a “linear byte stream” and each MPI application needs to provide its own file view and data representations to interpret those bytes May 30 -31, 2012 HDF 5 Workshop at PSI 17 www. hdfgroup. org
MPI-IO vs. HDF 5 • All data stored are machine dependent except the “external 32” representation • External 32 is defined in Big Endianness • Little-endian machines have to do the data conversion in both read or write operations • 64 -bit sized data types may lose information May 30 -31, 2012 HDF 5 Workshop at PSI 18 www. hdfgroup. org
MPI-IO vs. HDF 5 • HDF 5 is data management software • It stores data and metadata according to the HDF 5 data format definition • HDF 5 file is self-describing • Each machine can store the data in its own native representation for efficient I/O without loss of data precision • Any necessary data representation conversion is done by the HDF 5 library automatically May 30 -31, 2012 HDF 5 Workshop at PSI 19 www. hdfgroup. org
PERFORMANCE ANALYSIS May 30 -31, 2012 HDF 5 Workshop at PSI 20 www. hdfgroup. org
Performance analysis • Some common causes of poor performance • Possible solutions May 30 -31, 2012 HDF 5 Workshop at PSI 21 www. hdfgroup. org
My PHDF 5 application I/O is slow • • Use larger I/O data sizes Independent vs. Collective I/O Specific I/O system hints Increase Parallel File System capacity May 30 -31, 2012 HDF 5 Workshop at PSI 22 www. hdfgroup. org
Write speed vs. block size May 30 -31, 2012 HDF 5 Workshop at PSI 23 www. hdfgroup. org
My PHDF 5 application I/O is slow • • Use larger I/O data sizes Independent vs. Collective I/O Specific I/O system hints Increase Parallel File System capacity May 30 -31, 2012 HDF 5 Workshop at PSI 24 www. hdfgroup. org
Independent vs. collective access • User reported Independent data transfer mode was much slower than the Collective data transfer mode • Data array was tall and thin: 230, 000 rows by 6 columns May 30 -31, 2012 HDF 5 Workshop at PSI : : : 230, 000 rows : : : 25 www. hdfgroup. org
Collective vs. independent calls • MPI definition of collective calls • All processes of the communicator must participate in calls in the right order. E. g. , • Process 1 • call A(); call B(); Process 2 call A(); call B(); **right** call B(); call A(); **wrong** • Independent means not collective • Collective is not necessarily synchronous May 30 -31, 2012 HDF 5 Workshop at PSI 26 www. hdfgroup. org
Debug Slow Parallel I/O Speed(1) • Writing to one dataset - Using 4 processes == 4 columns - data type is 8 -byte doubles - 4 processes, 1000 rows == 4 x 1000 x 8 = 32, 000 bytes • % mpirun -np 4. /a. out i t 1000 - Execution time: 1. 783798 s. • % mpirun -np 4. /a. out i t 2000 - Execution time: 3. 838858 s. • Difference of 2 seconds for 1000 more rows = 32, 000 bytes. • Speed of 16 KB/sec!!! Way too slow. May 30 -31, 2012 HDF 5 Workshop at PSI 27 www. hdfgroup. org
Debug slow parallel I/O speed(2) • Build a version of PHDF 5 with • . /configure --enable-debug --enable-parallel … • This allows the tracing of MPIO I/O calls in the HDF 5 library. • E. g. , to trace • MPI_File_read_xx and MPI_File_write_xx calls • % setenv H 5 FD_mpio_Debug “rw” May 30 -31, 2012 HDF 5 Workshop at PSI 28 www. hdfgroup. org
Debug slow parallel I/O speed(3) % setenv H 5 FD_mpio_Debug ’rw’ % mpirun -np 4. /a. out i t 1000 # Indep. ; contiguous. in H 5 FD_mpio_write mpi_off=0 size_i=96 in H 5 FD_mpio_write mpi_off=2056 size_i=8 in H 5 FD_mpio_write mpi_off=2048 size_i=8 in H 5 FD_mpio_write mpi_off=2072 size_i=8 in H 5 FD_mpio_write mpi_off=2064 size_i=8 in H 5 FD_mpio_write mpi_off=2088 size_i=8 in H 5 FD_mpio_write mpi_off=2080 size_i=8 … • Total of 4000 of these little 8 bytes writes == 32, 000 bytes. May 30 -31, 2012 HDF 5 Workshop at PSI 29 www. hdfgroup. org
Independent calls are many and small • Each process writes one element of one row, skips to next row, write one element, so on. • Each process issues 230, 000 writes of 8 bytes each. May 30 -31, 2012 HDF 5 Workshop at PSI : : : 230, 000 rows : : : 30 www. hdfgroup. org
Debug slow parallel I/O speed (4) % setenv H 5 FD_mpio_Debug ’rw’ % mpirun -np 4. /a. out i h 1000# Indep. , Chunked by column. in H 5 FD_mpio_write mpi_off=0 size_i=96 in H 5 FD_mpio_write mpi_off=3688 size_i=8000 in H 5 FD_mpio_write mpi_off=11688 size_i=8000 in H 5 FD_mpio_write mpi_off=27688 size_i=8000 in H 5 FD_mpio_write mpi_off=19688 size_i=8000 in H 5 FD_mpio_write mpi_off=96 size_i=40 in H 5 FD_mpio_write mpi_off=136 size_i=544 in H 5 FD_mpio_write mpi_off=680 size_i=120 in H 5 FD_mpio_write mpi_off=800 size_i=272 … Execution time: 0. 011599 s. May 30 -31, 2012 HDF 5 Workshop at PSI 31 www. hdfgroup. org
Use collective mode or chunked storage • Collective I/O will combine many small independent calls into few but bigger calls • Chunks of columns speeds up too May 30 -31, 2012 HDF 5 Workshop at PSI : : : 230, 000 rows : : : 32 www. hdfgroup. org
Collective vs. independent write 1000 900 Seconds to write 800 700 600 500 Independent write 400 Collective write 300 200 100 0 0. 25 0. 5 1 1. 88 2. 29 2. 75 Data size in MBs May 30 -31, 2012 HDF 5 Workshop at PSI 33 www. hdfgroup. org
My PHDF 5 application I/O is slow • • Use larger I/O data sizes Independent vs. Collective I/O Specific I/O system hints Increase Parallel File System capacity May 30 -31, 2012 HDF 5 Workshop at PSI 34 www. hdfgroup. org
Effects of I/O hints: IBM_largeblock_io • GPFS at LLNL ASCI Blue machine MB/sec • 4 nodes, 16 tasks • Total data size 1024 MB • I/O buffer size 1 MB May 30 -31, 2012 HDF 5 Workshop at PSI 35 www. hdfgroup. org
My PHDF 5 application I/O is slow • If my application I/O performance is slow, what can I do? • • Use larger I/O data sizes Independent vs. Collective I/O Specific I/O system hints Increase Parallel File System capacity • Add more disks, interconnect or I/O servers to your hardware setup • Expensive! May 30 -31, 2012 HDF 5 Workshop at PSI 36 www. hdfgroup. org
PARALLEL TOOLS May 30 -31, 2012 HDF 5 Workshop at PSI 37 www. hdfgroup. org
Parallel Tools • h 5 perf • Performance measuring tool showing I/O performance for different I/O APIs May 30 -31, 2012 HDF 5 Workshop at PSI 38 www. hdfgroup. org
h 5 perf • An I/O performance measurement tool • Tests 3 File I/O APIs: • POSIX I/O (open/write/read/close…) • MPI-I/O (MPI_File_{open, write, read, close}) • PHDF 5 • H 5 Pset_fapl_mpio (using MPI-I/O) • H 5 Pset_fapl_mpiposix (using POSIX I/O) • An indication of I/O speed upper limits May 30 -31, 2012 HDF 5 Workshop at PSI 39 www. hdfgroup. org
Useful Parallel HDF Links • Parallel HDF information site http: //www. hdfgroup. org/HDF 5/PHDF 5/ • Parallel HDF 5 tutorial available at http: //www. hdfgroup. org/HDF 5/Tutor/ • HDF Help email address help@hdfgroup. org May 30 -31, 2012 HDF 5 Workshop at PSI 40 www. hdfgroup. org
HDF 5 PROGRAMMING MODEL May 30 -31, 2012 HDF 5 Workshop at PSI 41 www. hdfgroup. org
How to compile PHDF 5 applications • h 5 pcc – HDF 5 C compiler command • Similar to mpicc • h 5 pfc – HDF 5 F 90 compiler command • Similar to mpif 90 • To compile: • % h 5 pcc h 5 prog. c • % h 5 pfc h 5 prog. f 90 May 30 -31, 2012 HDF 5 Workshop at PSI 42 www. hdfgroup. org
Programming restrictions • PHDF 5 opens a parallel file with a communicator • Returns a file handle • Future access to the file via the file handle • All processes must participate in collective PHDF 5 APIs • Different files can be opened via different communicators May 30 -31, 2012 HDF 5 Workshop at PSI 43 www. hdfgroup. org
Collective HDF 5 calls • All HDF 5 APIs that modify structural metadata are collective • File operations - H 5 Fcreate, H 5 Fopen, H 5 Fclose • Object creation - H 5 Dcreate, H 5 Dclose • Object structure modification (e. g. , dataset extent modification) - H 5 Dextend • http: //www. hdfgroup. org/HDF 5/doc/RM/Coll ective. Calls. html May 30 -31, 2012 HDF 5 Workshop at PSI 44 www. hdfgroup. org
Other HDF 5 calls • Array data transfer can be collective or independent - Dataset operations: H 5 Dwrite, H 5 Dread • Collectiveness is indicated by function parameters, not by function names as in MPI API May 30 -31, 2012 HDF 5 Workshop at PSI 45 www. hdfgroup. org
What does PHDF 5 support ? • After a file is opened by the processes of a communicator • All parts of file are accessible by all processes • All objects in the file are accessible by all processes • Multiple processes may write to the same data array • Each process may write to individual data array May 30 -31, 2012 HDF 5 Workshop at PSI 46 www. hdfgroup. org
PHDF 5 API languages • C and F 90, 2003 language interfaces • Platforms supported: • Most platforms with MPI-IO supported. E. g. , • IBM AIX • Linux clusters • Cray XT May 30 -31, 2012 HDF 5 Workshop at PSI 47 www. hdfgroup. org
Programming model • HDF 5 uses access template object (property list) to control the file access mechanism • General model to access HDF 5 file in parallel: - Set up MPI-IO access template (file access property list) - Open File - Access Data - Close File May 30 -31, 2012 HDF 5 Workshop at PSI 48 www. hdfgroup. org
Moving your sequential application to the HDF 5 parallel world MY FIRST PARALLEL HDF 5 PROGRAM May 30 -31, 2012 HDF 5 Workshop at PSI 49 www. hdfgroup. org
Example of PHDF 5 C program Parallel HDF 5 program has extra calls MPI_Init(&argc, &argv); 1. 2. 3. 4. 5. fapl_id = H 5 Pcreate(H 5 P_FILE_ACCESS); H 5 Pset_fapl_mpio(fapl_id, comm, info); file_id = H 5 Fcreate(FNAME, …, fapl_id); space_id = H 5 Screate_simple(…); dset_id = H 5 Dcreate(file_id, DNAME, H 5 T_NATIVE_INT, space_id, …); 6. xf_id = H 5 Pcreate(H 5 P_DATASET_XFER); 7. H 5 Pset_dxpl_mpio(xf_id, H 5 FD_MPIO_COLLECTIVE); 8. status = H 5 Dwrite(dset_id, H 5 T_NATIVE_INT, …, xf_id…); MPI_Finalize(); May 30 -31, 2012 HDF 5 Workshop at PSI 50 www. hdfgroup. org
Writing patterns EXAMPLE May 30 -31, 2012 HDF 5 Workshop at PSI 51 www. hdfgroup. org
Parallel HDF 5 tutorial examples • For simple examples how to write different data patterns see http: //www. hdfgroup. org/HDF 5/Tutor/parallel. html May 30 -31, 2012 HDF 5 Workshop at PSI 52 www. hdfgroup. org
Programming model • Each process defines the memory and file hyperslabs using H 5 Sselect_hyperslab • Each process executes a write/read call using hyperslabs defined, which is either collective or independent • The hyperslab start, count, stride, and block parameters define the portion of the dataset to write to - Contiguous hyperslab Regularly spaced data (column or row) Pattern Chunks May 30 -31, 2012 HDF 5 Workshop at PSI 53 www. hdfgroup. org
Four processes writing by rows HDF 5 "SDS_row. h 5" { GROUP "/" { DATASET "Int. Array" { DATATYPE H 5 T_STD_I 32 BE DATASPACE SIMPLE { ( 8, 5 ) / ( 8, 5 ) } DATA { 10, 10, 10, 11, 11, 11, 12, 12, 12, 13, 13, 13, 13 May 30 -31, 2012 HDF 5 Workshop at PSI 54 www. hdfgroup. org
Two processes writing by columns HDF 5 "SDS_col. h 5" { GROUP "/" { DATASET "Int. Array" { DATATYPE H 5 T_STD_I 32 BE DATASPACE SIMPLE { ( 8, 6 ) / ( 8, 6 ) } DATA { 1, 2, 10, 20, 100, 200, 1, 2, 10, 20, 100, 200, 1, 2, 10, 20, 100, 200 May 30 -31, 2012 HDF 5 Workshop at PSI 55 www. hdfgroup. org
Four processes writing by pattern HDF 5 "SDS_pat. h 5" { GROUP "/" { DATASET "Int. Array" { DATATYPE H 5 T_STD_I 32 BE DATASPACE SIMPLE { ( 8, 4 ) / ( 8, 4 ) } DATA { 1, 3, 2, 4, 1, 3, 2, 4, 2, 4 May 30 -31, 2012 HDF 5 Workshop at PSI 56 www. hdfgroup. org
Four processes writing by chunks HDF 5 "SDS_chnk. h 5" { GROUP "/" { DATASET "Int. Array" { DATATYPE H 5 T_STD_I 32 BE DATASPACE SIMPLE { ( 8, 4 ) / ( 8, 4 ) } DATA { 1, 1, 2, 2, 3, 3, 4, 4, 3, 3, 4, 4 May 30 -31, 2012 HDF 5 Workshop at PSI 57 www. hdfgroup. org
The HDF Group Thank You! Questions? May 30 -31, 2012 HDF 5 Workshop at PSI 58 www. hdfgroup. org
- Eandomi
- Hdf dataset
- Hdf explorer
- Hdf explorer
- Hdf server
- Hdf filter
- Snuipp hdf
- Hdf cloud
- Hdf cloud
- Hdf tools
- Hdf matlab
- Ter hdf
- Hdf clogin
- Example of like forces
- Parallelism rules
- Parallel structure.
- Parallel and non parallel structure
- Parallel structure examples
- The outer terminus in the pattern area.
- Shift register truth table
- Hát kết hợp bộ gõ cơ thể
- Bổ thể
- Tỉ lệ cơ thể trẻ em
- Gấu đi như thế nào
- Tư thế worms-breton
- Chúa yêu trần thế alleluia
- Môn thể thao bắt đầu bằng chữ f
- Thế nào là hệ số cao nhất
- Các châu lục và đại dương trên thế giới
- Công thức tiính động năng
- Trời xanh đây là của chúng ta thể thơ
- Mật thư anh em như thể tay chân
- Phép trừ bù
- độ dài liên kết
- Các châu lục và đại dương trên thế giới
- Thể thơ truyền thống
- Quá trình desamine hóa có thể tạo ra
- Một số thể thơ truyền thống
- Cái miệng nó xinh thế
- Vẽ hình chiếu vuông góc của vật thể sau
- Biện pháp chống mỏi cơ
- đặc điểm cơ thể của người tối cổ
- Thế nào là giọng cùng tên?
- Vẽ hình chiếu đứng bằng cạnh của vật thể
- Fecboak
- Thẻ vin
- đại từ thay thế
- điện thế nghỉ
- Tư thế ngồi viết
- Diễn thế sinh thái là
- Các loại đột biến cấu trúc nhiễm sắc thể
- Số nguyên tố là số gì
- Tư thế ngồi viết
- Lời thề hippocrates
- Thiếu nhi thế giới liên hoan
- ưu thế lai là gì
- Hổ sinh sản vào mùa nào
- Sự nuôi và dạy con của hổ