Compiler Support for Efficient Softwareonly Checkpointing Chuck Chengyan

  • Slides: 46
Download presentation
Compiler Support for Efficient Softwareonly Checkpointing Chuck (Chengyan) Zhao Dept. of Computer Science University

Compiler Support for Efficient Softwareonly Checkpointing Chuck (Chengyan) Zhao Dept. of Computer Science University of Toronto Ph. D. Thesis Exam Sept. 07, 2012 1

Execution Going Backward? A time-travel machine going back to the past arbitrary distance unlimited

Execution Going Backward? A time-travel machine going back to the past arbitrary distance unlimited number of attempts no special hardware support efficient … Benefits debugging software backtracking … 2

Checkpointing (CKPT) Can Help Checkpointing A process to take program snapshots Recover execution when

Checkpointing (CKPT) Can Help Checkpointing A process to take program snapshots Recover execution when error happens Enhance reliability and robustness Existing Checkpointing Approaches Hardware-based fine-grain solutions Software-only coarse-grain solutions Our Proposed Solution: Fine-grain Software-only CKPT 3

Fine-grain Checkpointing failure recovery main memory Checkpoint region … a = 5; b =

Fine-grain Checkpointing failure recovery main memory Checkpoint region … a = 5; b = 7; … a: /1 5/ b: /2 7/ (&a, 1) (&b, 2) checkpoint buffer 4

Our Proposed Checkpointing Approach An Efficient Software Checkpointing Framework Software only no need for

Our Proposed Checkpointing Approach An Efficient Software Checkpointing Framework Software only no need for hardware support Cover arbitrarily large code region dynamically allocate ckpt buffer Leverage compiler optimizations aggressive overhead reduction Example Applications Program debugging Automatic software backtracking 5

Compiler Checkpointing (CKPT) Framework C/C++ Annotated source Source code Compiler IR Enable Checkpointing Optimize

Compiler Checkpointing (CKPT) Framework C/C++ Annotated source Source code Compiler IR Enable Checkpointing Optimize Checkpointing 1. CKPT Inlining LLVM frontend 2. Pre Optimize 3. Redundancy Eliminations Callsite Analysis 4. Hoisting Inter-procedural Transformations 5. Aggregation 6. Non Rollback Exposed Store Elimination Intra-procedural Transformations 7. Heap Optimize 8. Array Optimize Special Cases Handling 9. Post Optimize Backend Process x 86 x 64 … POWER C/C++ 6

Enabling Checkpointing 7

Enabling Checkpointing 7

Transformations to Enable Checkpointing start_ckpt(); … backup(&a, sizeof(a)); a = …; handle. Memcpy(…); memcpy(d,

Transformations to Enable Checkpointing start_ckpt(); … backup(&a, sizeof(a)); a = …; handle. Memcpy(…); memcpy(d, s, len); 3 Steps: 1. Callsite analysis 2. Intra-procedural transformation 3. Inter-procedural transformation foo_ckpt(); foo(); … stop_ckpt(cond); foo(…){ /* body of foo() */} foo_ckpt(…){ /* body of foo_ckpt() */ }… 8

Optimizations 9

Optimizations 9

Checkpointing Optimization Framework 1. CKPT Inlining 2. Pre Optimization 3. Redundancy Eliminations (3 REs)

Checkpointing Optimization Framework 1. CKPT Inlining 2. Pre Optimization 3. Redundancy Eliminations (3 REs) 4. Hoisting Optimize Checkpointing 5. Aggregation 6. Non Rollback Exposed Store Elimination 7. Dyn. Mem (Heap) Optimization 8. Array Optimization 9. Post Optimization 10

Redundancy Elimination Optimization start_ckpt(); … dom if (C){ backup(&a, sizeof(a)); a = …; }

Redundancy Elimination Optimization start_ckpt(); … dom if (C){ backup(&a, sizeof(a)); a = …; } … backup(&a, sizeof(a)); a = …; … backup(&a, sizeof(a)); dom a = …; … … stop_ckpt(cond); Algorithm establish dominating relationship stop_ckpt() marker promote leading backup call re-establish dominating relationship among backup calls eliminate all non-leading backup call(s) RE 1: remove all non-leading backup call(s) 11

Rollback Exposed Store int a, b; … start_ckpt(); … b = … a op

Rollback Exposed Store int a, b; … start_ckpt(); … b = … a op …; … backup(&a, sizeof(a)); a = …; … Rollback Exposed Store: a store to a location with a possible previous load of that location must backup 'a' because the prior load of 'a' must access the "old" value on rollback – i. e. , 'a' is "rollback exposed" … stop_ckpt(cond); CAN’T optimize this case! 12

Non-Rollback Exposed Store Elimination (NRESE) int a, b; Algorithm: ensure that … start_ckpt(); no

Non-Rollback Exposed Store Elimination (NRESE) int a, b; Algorithm: ensure that … start_ckpt(); no prior use of 'a', hence it is nonno use of the address (&a) on … rollback-exposed any path the backup address (&a) isn’t … aliased to anything we can eliminate the backup required for backup(&a, sizeof(a)); empty points-to set any rollback-exposed store a = …; … … stop_ckpt(cond); NRESE is a new, checkpoint-specific optimization 13

Applications 14

Applications 14

App 1: CKPT enabled debugging T: safe point, earlier than P, the program can

App 1: CKPT enabled debugging T: safe point, earlier than P, the program can reach through checkpoint recovery Key benefits CKPT Region execution rewinding support for large region unlimited # of retries P: root causere-execution of a bug avoids entire program Q: place where the bug manifests (a user or programmer notices the bug at this point) 1515

Simulated Annealing Placement in VPR blocks nets Key ? A benefits B automate support

Simulated Annealing Placement in VPR blocks nets Key ? A benefits B automate support for backtracking B A backup actions abort C commit C ? D D cover arbitrarily complex algorithm cleaner code, simplify programming Algorithm: programmer focus on algorithm 1) Start with random placement of blocks 2) Randomly pick a pair of blocks to swap 3) Keep new placement if an improvement …

Evaluation 17

Evaluation 17

Platform and Benchmarks Evaluation Platform Core i 7 920, 4 GB DDR 3, 200

Platform and Benchmarks Evaluation Platform Core i 7 920, 4 GB DDR 3, 200 GB SATA Debian 6 -i 386, gcc/g+-4. 4. 5 LLVM-2. 9 Benchmarks Bug. Bench: 1. 2. 0 5 programs with buffer-overflow bugs 3 CKPT regions per program: Small. Medium. Large VPR: 5. 0. 2 FPGA CAD tool, 1 CKPT region CKPT Comparison lib. CKPT [USENIX 95]: U. Tennessee ICCSTM [PLDI 06]: STM based on Intel ICC unfair comparison, but closest alternative 18

Compare with Coarse-gain Scheme: lib. CKPT 1 HUGE gain over coarse-grain lib. CKPT 19

Compare with Coarse-gain Scheme: lib. CKPT 1 HUGE gain over coarse-grain lib. CKPT 19

Compare with Fine-gain Scheme: ICCSTM better than fine-grain ICCSTM 20

Compare with Fine-gain Scheme: ICCSTM better than fine-grain ICCSTM 20

RE 1 Optimization: buffer size reduction % % % RE 1 is the most-effective

RE 1 Optimization: buffer size reduction % % % RE 1 is the most-effective optimization 2121

Post RE 1 Optimization: buffer size reduction % % % % % Other optimizations

Post RE 1 Optimization: buffer size reduction % % % % % Other optimizations also contribute 2222

Conclusion CKPT Optimization Framework compiler-driven automatic software-only compiler analysis and optimizations 100 -1000 X

Conclusion CKPT Optimization Framework compiler-driven automatic software-only compiler analysis and optimizations 100 -1000 X less overhead: over coarse-grain CKPT 4 -50 X improvement: over fine-grain ICCSTM CKPT-supported Apps debugger: execution rewind in time up to: 98% of CKPT buffer size reduction up to: 95% of backup call reduction VPR: automatic software backtracking only 15% CKPT overhead vs. manual checkpointing 23

Questions and Answers ? 24

Questions and Answers ? 24

Algorithm: Redundancy Elimination 1 1. Build dominating relationship (DOM) among backup calls 2. Identify

Algorithm: Redundancy Elimination 1 1. Build dominating relationship (DOM) among backup calls 2. Identify leading backup call 3. Promote suitable leading backup call 4. Remove non-leading backup call(s) 25

Algorithm: NRESE Backup address is NOT aliased to anything points-to set is empty AND

Algorithm: NRESE Backup address is NOT aliased to anything points-to set is empty AND On any path from begin of CKPT to the respective write, there is no use of the backup address the value can be independently re-generated without the need of it self 26

1 D array vs. Hash Tables Buffer Schemes 27

1 D array vs. Hash Tables Buffer Schemes 27

Compare with Coarse-gain Scheme: lib. CKPT 100 KX 1 KX 100 X 10 X

Compare with Coarse-gain Scheme: lib. CKPT 100 KX 1 KX 100 X 10 X HUGE gain over coarse-grain lib. CKPT 28

Compiler Checkpointing (CKPT) Framework C/C++ Annotated source Source code LLVM IR Enable Checkpointing Backend

Compiler Checkpointing (CKPT) Framework C/C++ Annotated source Source code LLVM IR Enable Checkpointing Backend Process Optimize Checkpointing 1. CKPT Inlining 2. Pre Optimize 3. Redundancy Eliminations 4. Hoisting x 86 x 64 5. Aggregation … 6. Non Rollback Exposed Store Elimination Power 7. Heap Optimize 8. Array Optimize 9. Post Optimize C/C++ 29

CKPT Enabled Debugging Key benefits execution rewinding arbitrarily large region unlimited # of retries

CKPT Enabled Debugging Key benefits execution rewinding arbitrarily large region unlimited # of retries no restart 30

Compare with Fine-gain Scheme: ICCSTM better than best-known fine-grain solution 31

Compare with Fine-gain Scheme: ICCSTM better than best-known fine-grain solution 31

Redundancy Elimination Optimization 1 start_ckpt(); … backup(&a, sizeof(a)); a = …; … if (C){

Redundancy Elimination Optimization 1 start_ckpt(); … backup(&a, sizeof(a)); a = …; … if (C){ backup(&a, sizeof(a)); a = …; … } … Algorithm establish dominating relationship D among backup calls promote leading backup call eliminate all nonleading backup call(s) … stop_ckpt(c); RE 1: keep only dominating backup call 32

CKPT Support for Automatic Backtracking (VPR) initial guess obtain a new result (manual CKPT)

CKPT Support for Automatic Backtracking (VPR) initial guess obtain a new result (manual CKPT) check result good bad abort and try next commit and continue … 33 CKPT automates the process, regardless of backtracking complexity

 34

34

Key benefits automate support for backtracking backup actions abort commit cover arbitrarily complex algorithm

Key benefits automate support for backtracking backup actions abort commit cover arbitrarily complex algorithm cleaner code, simplify programming programmer focus on algorithm 35

App 2: CKPT enabled backtracking Key benefits Initial Guess automate support for backtracking backup

App 2: CKPT enabled backtracking Key benefits Initial Guess automate support for backtracking backup actions abort commit covergood arbitrarily complex Evaluate algorithm bad Commit Data cleaner code, (manual simplify. CKPT) programming Reset Data programmer focus on algorithm stop condition reached Finish 3636

Key benefits automate CKPT process backup actions abort commit cover arbitrarily complex algorithm simplify

Key benefits automate CKPT process backup actions abort commit cover arbitrarily complex algorithm simplify programming programmer focus on algorithm 37

1. CKPT Inlining 2. Pre Optimize 3. Redundancy Eliminations 4. Hoisting 5. Aggregation 6.

1. CKPT Inlining 2. Pre Optimize 3. Redundancy Eliminations 4. Hoisting 5. Aggregation 6. Non Rollback Exposed Store Elimination 7. Heap Optimize 8. Array Optimize 9. Post Optimize 38

How Can A Compiler Help Checkpointing? Enable CKPT compiler transformations Optimize CKPT do standard

How Can A Compiler Help Checkpointing? Enable CKPT compiler transformations Optimize CKPT do standard optimizations apply? support CKPT-specific optimizations? CKPT Uses debugging backtracking 39

Optimization: buffer size reduction % % % up to 98% of CKPT buffer size

Optimization: buffer size reduction % % % up to 98% of CKPT buffer size reduction 40

41

41

T: pick a pair of blocks to swap CKPT Region Compute cost of the

T: pick a pair of blocks to swap CKPT Region Compute cost of the swapped version Q: keep swap if improvement, discard otherwise 4242

Agenda Enable Checkpointing Optimize Checkpointing Enabled Applications Test and Evaluation Summary 43

Agenda Enable Checkpointing Optimize Checkpointing Enabled Applications Test and Evaluation Summary 43

The Lengthy Development-Cycle Problem Develop Run Debug long cycle time … 44

The Lengthy Development-Cycle Problem Develop Run Debug long cycle time … 44

App 2: CKPT enabled automatic backtracking (VPR) T: pick a pair of blocks to

App 2: CKPT enabled automatic backtracking (VPR) T: pick a pair of blocks to swap Key benefits automate support for backtracking CKPT Region backup actions Proceed with VPR’s abort random/simulatedcommit annealing based cover arbitrarily complex algorithm cleaner code, simplify programming programmer focus on algorithm Q: keep swap if improvement, discard otherwise 4545

46

46