Dynamic Software Update Testing Framework and Empirical Study




































- Slides: 36
Dynamic Software Update Testing: Framework and Empirical Study Christopher M. Hayden, Eric A. Hardisty, Michael Hicks, Jeffrey S. Foster University of Maryland, College Park
Dynamic Software Updating (DSU) v Performing updates to software at runtime has clear benefits: v Increased software availability v No need to terminate active connections / computation v … but can we trust updated software? v Critical to ensure updates are safe 2
Our Contributions v Verification of DSU through testing: v Testing Procedure v Test Minimization Algorithm v Empirical Study: v Effectiveness of Minimization v Update Safety / Effectiveness of Safety Checks 3
DSU Safety v DSU creates the opportunity for new sources of bugs: v Faulty state transformation v Unsafe update timing v Safety Checks – restrict when updates may be applied v Activeness Safety / Con-freeness Safety 4
Activeness Safety (AS) v AS prevents updates to active code v In this example, no patch updating main or foo is allowed: main() { foo() { … foo(); … baz(); bar(); } } 5
Con-freeness Safety (CFS) v CFS (Stoyle, et al ‘ 05) allows updates to active code only when type safety can be ensured v In this example, no patch updating the signature of baz or bar is allowed: main() { foo() { … foo(); … baz(); bar(); } } 6
Unsafe Timing: Type Safety Version 0 Version 1 (patch) int foo(int x, int y) { return x + y; } void foo(int *x, int y) { *x += y; } crash void bar() { int z = 0; … z = foo(z, 5) } void bar() { int z = 0; … foo(&z, 5) } 7
DSU Testing v Safety Checks offer limited guarantees: v CFS and AS ensure type-safe execution v AS ensures that you never return to old code following an update v Neither of these properties ensure safe update timing v We propose testing to verify the correctness of allowed update points: v Use existing suite of application system tests v Ensure that updating anywhere during the execution of those tests results in an execution that passes the test. 8
Testing Procedure v Approach: Trace Start v Instrument application to trace update points v Execute system test and gather initial trace Potential Update Points v For each update point in the initial trace, perform an update test: force an update at that point while executing the system test 9
Testing Procedure v Approach: v Instrument application to trace update points v Execute system test and gather initial trace v For each update point in the initial trace, perform an update test: force an update at that point while executing the system test ✔ initial trace 10
Testing Procedure v Approach: v Instrument application to trace update points v Execute system test and gather initial trace v For each update point in the initial trace, perform an update test: force an update at that point while executing the system test ✔ ✔✘✔ initial trace update tests 11
Update Test Minimization v Program traces may have thousands or millions of update points v Many update tests have the same behavior for a given patch v we can eliminate redundant tests Version 0 void main() { foo(); bar(); baz(); } Patch A baz() {…} All update points yield same behavior Patch B foo() {…} bar() {…} baz() {…} All update points yield distinct behavior 12
Minimization Algorithm v Execution events are traced if they have the potential to conflict with a patch v A event conflicts with a patch p if applying p before the event might produce a different result than applying p after the event v Example: function calls, global variable accesses v Trace the execution of a test T on P 0 v Iterate through the trace noting the last update point each time we reach a conflicting trace element v Run only the identified update tests Tnp 13
Empirical Results 14
Experimental Setup v Based testing infrastructure on top of the Ginseng DSU system (Neamtiu, et al): v Modified to support tracing and updating at pre- selected update points v Insertion of explicit update points before each function call to approximate more liberal systems v Disabled safety checking (CFS) for experiments v Tested 3 years of patches to Open. SSH and vsftpd (only report Open. SSH in this talk) 15
Program Modifications foo() { while (1) { // main loop update(); extract {. . . // main loop body } } extract {. . . // after main Loop } Identify Long-running loops Add a Manually Selected Update Point Perform Loop Body Extraction Perform Continuation Extraction } 16
Experiments: Update Test Suite v How many update tests must be run to test real-world updates to real-world applications? v How effective is minimization at eliminating redundant tests? 17
Update Test Suite Size: Open. SSH D to next version Reduction # Tests Sig Fun Type All Points Activeness-Safe Points 0 75 3 98 5 580, 871 g 31, 791 (95%) 35, 314 g 3, 027 (91%) 1 75 0 6 0 705, 322 g 1, 795 (~100%) 587, 578 g 1, 717 (~100%) 2 76 5 238 11 638, 720 g 63, 011 (90%) 20, 902 g 2, 353 (89%) 3 91 0 18 0 772, 198 g 4, 324 (99%) 638, 803 g 3, 775 (99%) 4 91 13 172 10 773, 086 g 27, 399 (96%) 21, 343 g 1, 564 (93%) 5 104 0 24 1 878, 235 g 17, 398 (98%) 111, 950 g 1, 723 (98%) 6 104 6 257 10 879, 668 g 47, 092 (95%) 44, 278 g 2, 139 (95%) 7 104 4 179 12 918, 717 g 89, 601 (90%) 100, 854 g 4, 141 (96%) 8 105 0 72 3 973, 364 g 34, 293 (96%) 61, 724 g 2, 070 (97%) 9 104 10 157 7 933, 514 g 52, 356 (94%) 61, 051 g 2, 891 (95%) Total 8, 053, 695 g 369, 060 (95%) 1, 683, 797 g 25, 400 (98%) 18
Empirical Study of Update Safety v How many failures occur when applying updates arbitrarily? v How many failures occur when applying updates subject only to the AS and CFS safety checks? 19
Safety: Open. SSH D to next version Update Tests Sig Fun Type All Points Failed Total CFS Points Failed Total AS Points Failed Total 0 75 3 98 5 19, 715 580, 871 0 68, 044 0 35, 314 1 75 0 6 0 0 705, 322 0 587, 578 2* 76 5 238 11 306, 965 683, 720 1, 688 75, 307 4 20, 902 3 91 0 18 0 0 772, 198 0 638, 803 4* 91 13 172 10 565, 681 773, 086 609 110, 633 380 21, 343 5 104 0 24 1 10, 703 878, 235 0 130, 000 0 111, 950 6 104 6 257 10 163, 333 879, 668 44, 461 96, 183 110 44, 278 7 104 4 179 12 11, 380 918, 717 1 80, 070 1 100, 854 8 105 0 72 3 3 973, 364 0 261, 885 0 61, 724 9 104 10 157 7 357, 919 933, 514 24 121, 337 0 61, 051 Total 1, 435, 699 8, 053, 695 46, 783 2, 420, 979 495 1, 683, 797 20
Unsafe Timing: Version Inconsistency Version 0 Version 1 (patch) void foo() { bar(); … baz(); } void bar() { … } void bar() { dig(); … } void baz() { … }
Manually Selected Update Points Safety D to next version # Tests Sig Fun Type Reduction Failed Total 0 75 3 98 5 566 g 566 (0%) 0 566 1 75 0 630 g 592 (6%) 0 630 2 76 5 238 11 568 g 568 (0%) 0 568 3 91 0 18 0 783 g 770 (2%) 0 783 4 91 13 172 10 782 g 782 (0%) 0 782 5 104 0 24 1 860 g 841 (2%) 0 860 6 104 6 257 10 859 g 859 (0%) 0 859 7 104 4 179 12 850 g 850 (0%) 0 850 8 105 0 72 3 868 g 823 (5%) 0 868 9 104 10 157 7 833 g 833 (0%) 0 833 Tota l 7, 599 g 7, 484 (2%) 0 7, 599 22
Summary v We have argued that verification is necessary to prevent unsafe updates v Provided empirical evidence that AS/CFS cannot prevent all unsafe updates v We have presented an approach for testing dynamic updates v We have presented and evaluated a minimization strategy to make update testing more practical 23
Additional Slides 24
Unsafe Timing: Type Safety Version 0 Version 1 (patch) int foo(int x, int y) { return x + y; } void foo(int *x, int y) { *x += y; } crash void bar() { int z = 0; … z = foo(z, 5) } void bar() { int z = 0; … foo(&z, 5) } 25
Reduction: vsftpd # D to next version Reduction Sig Fun Type All Points Activeness-Safe Points 0 0 6 0 210, 142 g 26 (~100%) 102, 307 g 26 (~100%) 1 1 12 0 210, 142 g 516 (~100%) 69, 775 g 166 (~100%) 2 0 215, 223 g 1, 122 (99%) 55, 555 g 553 (99%) 3 0 76 0 220, 564 g 3, 866 (98%) 37, 265 g 1, 912 (95%) 4 0 10 1 218, 586 g 19, 893 (91%) 2, 123 g 301 (86%) 5 0 25 1 223, 098 g 15, 910 (93%) 67, 330 g 3, 567 (95%) 6 0 100 2 223, 199 g 200, 653 (14%) 7, 437 g 2, 742 (63%) 7 0 93 2 222, 296 g 10, 371 (95%) 3, 098 g 275 (91%) Total 1, 753, 250 g 252, 357 (86%) 344, 890 g 9, 542 (97%) 26
Safety: vsftpd # D to next version All Points Failed Total CFS Points Failed Total AS Points Sig Fun Type Failed 0 0 6 0 0 210, 142 0 1 1 12 0 2, 462 210, 142 558 90, 073 2 0 21 0 0 215, 223 0 76 0 0 220, 564 4 0 10 1 43, 233 5 0 25 1 6 0 100 7 0 93 Total Manual Points Failed Total 35, 314 0 80 0 587, 578 0 80 215, 223 0 20, 902 0 80 0 220, 564 0 638, 803 0 80 218, 586 546 4, 478 0 21, 343 0 80 58 223, 098 0 24, 924 0 111, 950 0 80 2 2, 115 233, 199 0 3, 737 0 44, 278 0 82 2 234 222, 296 0 1, 993 0 100, 854 0 80 Total 48, 102 1, 753, 250 1, 104 771, 134 0 344, 890 0 642 27
Which Tests? P 0 Old Behavior Bugs & Deprecated Features P 1 Unchanged Behavior New Behavior 28 Bug-fixes & New Features
Nondeterminism v Program traces may differ between runs v Timing of signal handlers v Number of iterations of loops performing IO v Dependence on random numbers, system time, memory addresses, etc. v Handling nondeterminism: v Ensure that traces match up to update point v Annotate ignored regions of execution for which the produced trace is ignored for matching purposes 29
Program Versions vsftpd Open. SSH # Version Lo. C 0 3. 5 p 1 46, 735 1 3. 6. 1 p 1 2 Tests D to next version # Version Lo. C Tests D to next version Sig Fun Type 13 0 6 0 13, 059 13 1 12 0 2. 0. 2 p 2 13, 114 13 0 21 0 3 2. 0. 2 p 3 14, 293 13 0 76 0 10 4 2. 0. 2 16, 870 13 0 10 1 24 1 5 2. 0. 3 12, 977 13 0 25 1 6 257 10 6 2. 0. 4 14, 427 14 0 100 2 104 4 179 12 7 2. 0. 5 14, 482 13 0 93 2 56, 068 105 0 72 3 8 2. 0. 6 14, 785 4. 1 p 1 56, 104 10 157 7 4. 2 p 1 57, 294 Last version, not tested Sig Fun Type 75 3 98 5 0 2. 0. 0 13, 048 48, 459 75 0 6 0 1 2. 0. 1 3. 6. 1 p 2 48, 473 76 5 238 11 2 3 3. 7. 1 p 1 50, 448 91 0 18 0 4 3. 7. 1 p 2 50, 460 91 13 172 5 3. 8 p 1 51, 822 104 0 6 3. 8. 1 p 1 51, 838 104 7 3. 9 p 1 53, 260 8 4. 0 p 1 9 10 Last version, not tested 30
Unsafe Timing: Version Inconsistency Version 0 Version 1 (patch) void foo() { bar(); … baz(); } void bar() { … } void bar() { dig(); … } void baz() { … } 31
Unsafe Timing: Version Inconsistency (vsftpd) Version 0 void handle_upload_common() { Version 1 (patch) void handle_upload_common() { ret = do_file_recv(); if (ret == SUCCESS) write(226, “OK. ”); } void do_file_recv() { … // receive file if (ret == SUCCESS) write(226, “OK. ”); return ret; } } void do_file_recv () { … // receive file return ret; } 32
Unsafe Timing: Version Inconsistency (Open. SSH) Version 0 Version 1 (patch) void maincont() { extracted(); … serverloop 2(); } void extracted() { … } void extracted() { global_ptr = init; } void serverloop 2() { global_ptr = init; tmp = (*global_ptr). pw; } void serverloop 2() { tmp = (*global_ptr). pw; } 33
Activeness Safety (AS) v AS prevents updates to active code v In this example, no patch updating main or foo is allowed: main() { extracted(); foo(); … baz(); } extracted() { // initialization // code … } foo() { … bar(); } 34
Minimization Algorithm Initial Trace Update? (1) … Call(foo) Update? (2) … Call(bar) Update? (3) … Call(baz) p Algorithm State Last Update Pt: 1 ? Algorithm State Points To Test: {} Algorithm State Last Update Pt: 1 Points To Test: 2 Last Update Pt: {} 1 Algorithm State Points To Test: {} Algorithm State Last Update Pt: 2 Last Update Pt: {} 3 2 Points To Test: Algorithm State Points To Test: {} Last Update Pt: 3 Points To Test: {{}3 } (patch A) baz() {…} 35
Minimization Algorithm Initial Trace Update? (1) … Call(foo) Update? (2) … Call(bar) Update? (3) … Call(baz) p Algorithm State Last Update Pt: 1 ? Algorithm State Points To Test: {} Algorithm State Last Update Pt: 1 Points To Test: 21 } Last Update Pt: {{} 1 Algorithm State Points To Test: {1} Algorithm State Last Update Pt: 2 Last Update Pt: {3 21 Points To Test: 1, }2 } Algorithm. State Points To Test: { 1, 2 } Last Update Pt: 3 Points To Test: { 1, 2, 3 Points To Test: { 1, 2 } } (patch B) foo() {…} bar() {…} baz() {…} 36