stdchk A Checkpoint Storage System for Desktop Grid

  • Slides: 32
Download presentation
stdchk: A Checkpoint Storage System for Desktop Grid Computing Samer Al-Kiswany – UBC Matei

stdchk: A Checkpoint Storage System for Desktop Grid Computing Samer Al-Kiswany – UBC Matei Ripeanu – UBC Sudharshan S. Vazhkudai – ORNL Abdullah Gharaibeh – UBC The University of British Columbia Oak Ridge National Laboratory

Checkpointing Introduction Checkpointing uses: fault tolerance, debugging, or migration. Typically, an application running for

Checkpointing Introduction Checkpointing uses: fault tolerance, debugging, or migration. Typically, an application running for days on hundreds of nodes (e. g. a desktop gird ) saves checkpoint images periodically. . C C C ICDCS ‘ 08 C

Deployment Scenario ICDCS ‘ 08

Deployment Scenario ICDCS ‘ 08

The Challenge Although checkpointing is necessary: Ø It is a pure overhead from the

The Challenge Although checkpointing is necessary: Ø It is a pure overhead from the performance point of view. Most of the time spent writing to the storage system. Ø Generates a high load on the storage system Requirement: High performance, scalable, and reliable storage system optimized for checkpointing applications. Challenge: Low cost, transparent support for checkpointing at filesystem level. ICDCS ‘ 08

Checkpointing Workload Characteristics Ø Write intensive application ( bursty ). e. g. , a

Checkpointing Workload Characteristics Ø Write intensive application ( bursty ). e. g. , a job running on hundreds of nodes. periodically checkpoints 100 s of GB of data. Ø Write once, rarely read during application execution. Ø Potentially high similarity between consecutive checkpoints. Ø Applications specific checkpoint image life span. When it is safe to delete the image ? ICDCS ‘ 08

Why Checkpointing-Optimized Storage System? Ø Optimizing for checkpointing workload can bring valuable benefits: §

Why Checkpointing-Optimized Storage System? Ø Optimizing for checkpointing workload can bring valuable benefits: § High throughput through specialization. § Considerable storage space and network effort saving. through transparent support for incremental checkpointing. § Simplified data management by exploiting the particulaities of checkpoint usage scenarios. Ø Reduce the load on a share file-system Ø Can be built atop scavenged resources – low cost. ICDCS ‘ 08

stdchk A checkpointing optimized storage system built using scavenged resources. ICDCS ‘ 08

stdchk A checkpointing optimized storage system built using scavenged resources. ICDCS ‘ 08

Outline § stdchk architecture § stdchk features § stdchk system evaluation ICDCS ‘ 08

Outline § stdchk architecture § stdchk features § stdchk system evaluation ICDCS ‘ 08

stdchk Architecture Manager (Metadata management) Benefactors (Storage nodes) Client (FS interface) ICDCS ‘ 08

stdchk Architecture Manager (Metadata management) Benefactors (Storage nodes) Client (FS interface) ICDCS ‘ 08

stdchk Features Ø High-throughput for write operations Ø Support for transparent incremental checkpointing Ø

stdchk Features Ø High-throughput for write operations Ø Support for transparent incremental checkpointing Ø Simplified data management Ø High reliability through replication Ø POSIX file system API – as a result using stdchk does not require modifications to the application. ICDCS ‘ 08

Optimized Write Operation Alternatives Write procedure alternatives: Ø Complete local write Ø Incremental write

Optimized Write Operation Alternatives Write procedure alternatives: Ø Complete local write Ø Incremental write Ø Sliding window write ICDCS ‘ 08

Optimized Write Operation Alternatives Write procedure alternatives: Ø Complete local write Ø Incremental write

Optimized Write Operation Alternatives Write procedure alternatives: Ø Complete local write Ø Incremental write Ø Sliding window write Compute Node Application stdchk FS Interface Disk ICDCS ‘ 08

Optimized Write Operation Alternatives Write procedure alternatives: Ø Complete local write Ø Incremental write

Optimized Write Operation Alternatives Write procedure alternatives: Ø Complete local write Ø Incremental write Ø Sliding window write Compute Node Application stdchk FS Interface Disk ICDCS ‘ 08

Optimized Write Operation Alternatives Write procedure alternatives: Ø Complete local write Ø Incremental write

Optimized Write Operation Alternatives Write procedure alternatives: Ø Complete local write Ø Incremental write Ø Sliding window write Compute Node Application stdchk FS Interface Memory Disk ICDCS ‘ 08

Write Operation Evaluation Testbed: 28 machines Each machine has : two 3. 0 GHz

Write Operation Evaluation Testbed: 28 machines Each machine has : two 3. 0 GHz Xeon processors, 1 GB RAM, two 36. 5 GB SCSI disks. ICDCS ‘ 08

Achieved Storage Bandwidth Sliding Window write achieves high bandwidth (110 MBps) Saturates the 1

Achieved Storage Bandwidth Sliding Window write achieves high bandwidth (110 MBps) Saturates the 1 Gbps link The average ASB over a 1 Gbps testbed. ICDCS ‘ 08

stdchk Features Ø High throughput write operation Ø Transparent incremental checkpointing Ø Checkpointing optimized

stdchk Features Ø High throughput write operation Ø Transparent incremental checkpointing Ø Checkpointing optimized data management Ø POSIX file system interface – no required modification to the application ICDCS ‘ 08

Transparent Incremental Checkpointing Incremental checkpointing may bring valuable benefits: Ø Lower network effort. Ø

Transparent Incremental Checkpointing Incremental checkpointing may bring valuable benefits: Ø Lower network effort. Ø Less storage space used. But : How much similarity is there between consecutive checkpoints ? How can we detect similarities between checkpoints? Is this fast enough? ICDCS ‘ 08

Similarity Detection Mechanism – Compare-by-Hashing Checkpoint T 0 X X T 0 Y Y

Similarity Detection Mechanism – Compare-by-Hashing Checkpoint T 0 X X T 0 Y Y Z Z ICDCS ‘ 08

Similarity Detection Mechanism – Compare-by-Hash Will store T 1 Hashing Checkpoint T 1 X

Similarity Detection Mechanism – Compare-by-Hash Will store T 1 Hashing Checkpoint T 1 X W T 0 Y Y T 1 Z Z W ICDCS ‘ 08

Similarity Detection Mechanism Ø How to divide the file into blocks? § Fixed-size blocks

Similarity Detection Mechanism Ø How to divide the file into blocks? § Fixed-size blocks + compare-by-Hash (Fs. CH) § Content-based blocks + compare-by-Hash (Cb. CH) ICDCS ‘ 08

Fs. CH Insertion Problem B 1 B 2 B 3 B 4 B 5

Fs. CH Insertion Problem B 1 B 2 B 3 B 4 B 5 Checkpoint i+1 Result: Lower similarity detection ratio. ICDCS ‘ 08 B 6

Content-based Compare-by-Hash (Cb. CH) offset B 1 B 2 B 3 Checkpoint i m

Content-based Compare-by-Hash (Cb. CH) offset B 1 B 2 B 3 Checkpoint i m bytes Hashing k bits Hash. Value ==0 K 0=? ? 0 ? KK ICDCS ‘ 08 B 4

Content-based Compare-by-Hash (Cb. CH) B 1 B 2 B 3 B 4 Checkpoint i

Content-based Compare-by-Hash (Cb. CH) B 1 B 2 B 3 B 4 Checkpoint i B 1 BX Checkpoint i+1 Result: Higher similarity detection ratio. But: Computationally intensive. ICDCS ‘ 08 B 3 B 4

Evaluating Similarity Between Consecutive Checkpoints The Applications : BMS* and BLAST Checkpointing interval: 1,

Evaluating Similarity Between Consecutive Checkpoints The Applications : BMS* and BLAST Checkpointing interval: 1, 5 and 15 minutes Type Number of checkpoints Avg. Checkpoint size Application level 100 2. 4 MB System level - BLCR 1200 450 MB Virtual machine level - Xen 400 1 GB * Checkpoints by Pratul Agarwal (ORNL) ICDCS ‘ 08

Similarity Ratio and Detection Throughput Technique Interval Fs. CH 1 MB Cb. CH nooverlap

Similarity Ratio and Detection Throughput Technique Interval Fs. CH 1 MB Cb. CH nooverlap m=20 B, k=14 b BMS BLAST App BLCR 1 min 5 min 0. 0% [108] 23. 4% [109] 0. 0% [28. 4] 82% [26. 6] Xen 15 min 5 or 15 min 6. 3% [113] 0. 0% [110] 0. 0% [28. 4] 70% [26. 4] 0. 0% The table presents the average rate of detected similarity and the throughput in MB/s (in brackets) for each heuristic. But: Using the GPU, Cb. CH achieves over 190 MBps throughput !! - Store. GPU: Exploiting Graphics Processing Units to Accelerate Distributed Storage Systems, S. Al-Kiswany, A. Gharaibeh, E. Santos. Neto, G. Yuan, M. Ripeanu, HPDC, 2008. ICDCS ‘ 08

Compare-by-Hash Results Fs. CH slightly degrades achieved bandwidth. But reduces the storage space used

Compare-by-Hash Results Fs. CH slightly degrades achieved bandwidth. But reduces the storage space used and network effort by 24% Achieved Storage Bandwidth ICDCS ‘ 08

Outline § stdchk architecture § stdchk features § stdchk overall system evaluation ICDCS ‘

Outline § stdchk architecture § stdchk features § stdchk overall system evaluation ICDCS ‘ 08

stdchk Scalability Nodes Join Steady stdchk sustains high loads : Nodes Leave Ø Number

stdchk Scalability Nodes Join Steady stdchk sustains high loads : Nodes Leave Ø Number of nodes Ø Workload 7 clients: Each client writes 100 files (100 MB each). Total of 70 GB. stdchk pool of 20 benefactor nodes. ICDCS ‘ 08

Experiment with Real Application : BLAST Execution time: > 5 days Checkpointing interval :

Experiment with Real Application : BLAST Execution time: > 5 days Checkpointing interval : 30 s Stripe width : 4 benefactors Client machine: two 3. 0 GHz Xeon processors, SCSI disks. Checkpointing time (s) Data size (TB) Total execution time (s) Local disk stdchk 22, 733 16, 497 27. 0% 3. 55 1. 14 69. 0% 462, 141 455, 894 ICDCS ‘ 08 Improvement 1. 3%

Summary stdchk : A checkpointing optimized storage system built using scavenged resources. stdchk features:

Summary stdchk : A checkpointing optimized storage system built using scavenged resources. stdchk features: Ø Ø Ø High throughput write operation Saves considerable disk space and network effort. Checkpointing optimized data management Easy to adopt – implements a POSIX file system interface Inexpensive - built atop scavenged resources Consequently, stdchk: Ø Offloads the checkpointing workload from the shared FS. Ø Speeds up the checkpointing operations (reduces checkpointing overhead) ICDCS ‘ 08

Thank you netsyslab. ece. ubc. ca ICDCS ‘ 08

Thank you netsyslab. ece. ubc. ca ICDCS ‘ 08