Testing Concurrent Programs to Achieve High Synchronization Coverage
- Slides: 60
Testing Concurrent Programs to Achieve High Synchronization Coverage Haoran Hou Xi Wang Bo Man Ruslan Ryzhkov
Problem being addressed Effectiveness of Software Testing Measure the coverage of some aspect of the software Little research on increasing coverage for concurrent programs How to achieve high coverage of concurrent programs?
Fact: There’s a strong correlation between test suites with high coverage and the defectdetection ability of those test suites.
Background & Prior Work
Estimation phase: identifies coverage requirements R Program P + Test case {(l 1, l 2), (l 1, l 3), (l 2, l 3), …} Testing phase: generates thread schedules to execute the coverage requirements in R Figure 1: Overview of the thread-scheduling technique
Definitions M: Thread Model
M: Thread Model
precedence relation represents ordering constraints between actions of two different threads t and t 0. The ordering constraints are imposed at the time of thread creations.
Interleaves Execution Model
Synchronization Coverage Definition 1. Synchronization-Pair (SP) Coverage Requirement
Definition 2. SP Coverage Satisfaction Criteria
Prior Work Stress tests Random tests such testing did not reveal a known concurrency bug even when executing the software for one week. the techniques may produce the same interleavings for many executions, and may not reveal some concurrency bugs that occur under specific interleaving. Bug-directed random tests they are tailored to specific bug patterns, and may not explore diverse interleavings to reveal other kinds of faults.
Related Work Random testing Runs the program many times while injecting artificial delays into thread schedules to produce different interleavings e. g. Rstest/Con. Test there may be many duplicate interleavings, which may be inadequate for covering previously uncovered interleavings.
Related Work Testing based on Concurrency bug analysis e. g. Cal. Fuzzer/Race. Fuzzer /Deadlock. Fuzzer Fuzz testing or fuzzing is a software testing technique used to discover coding errors and security loopholes in software, operating systems or networks by inputting massive amounts of random data, called fuzz, to the system in an attempt to make it crash. If a vulnerability is found, a tool called a fuzz tester (or fuzzer), indicates potential causes. Fuzz testing was originally developed by Barton Miller at the University of Wisconsin in 1989.
Related Work Testing based on Concurrency bug analysis identify potential concurrency bugs using static analysis or using dynamic analysis obtained from a program trace. The techniques then run the program while manipulating the thread scheduler to trigger the possible bugs. Only manipulating interleavings near possible buggy code points
Related Work Systematic testing Explores distinct interleavings of the program in each different run. The inherent problem of these techniques is that the interleaving space to explore is exponentially larg.
Related Work (1) how to achieve higher coverage faster Coverage Criteria: def-use pair coverage , synchronizationpair coverage, and event-pair coverage criteria, etc. For this paper: directly control thread scheduling to increase coverage, and specifically aims at high synchronization-pair coverage faster. (2) how much testing is enough to guarantee quality Saturation-based testing: monitors the number of executed coverage requirements until the rate of increase of covering new requirements is less than a threshold (i. e. , the coverage reaches a saturation point). This testing is used as the stopping criterion in our empirical studies in Section 3.
Challenges • Create a new way of increasing the coverage of concurrent programs • Build new models and phases for the new technique • Show the new technique is better through experiments.
Goals of Paper • Achieve high coverage of concurrent programs by generating thread schedules to cover uncovered coverage requirements. • Present a description of a prototype tool implemented in Java.
Technique • Estimation Phase – Identify coverage requirements R(SP requirement) that can be satisfied by possible thread interleavings • Testing Phase – Generate thread schedules to execute the coverage requirement R
Estimation Phase Execute the P once Create every possible pair of lock Generate a thread model M Filter out some infeasible pairs • Acceptance Condition (AC)
Example Execute the P once Create every possible pair of lock Generate a thread model M Filter out some infeasible pairs <4 a, 2 a> <4 b, 2 b>
Testing Phase Invokes scheduling controller before each lock action Covered and uncovered • Determine rules
P=2 a paused={2 a} Uncovered: output covered: empty uncovered: same paused={2 a}
P=2 b paused={2 a, 2 b} Uncovered: <2 a, 2 b> Paused={2 b} output covered: empty uncovered: same paused={2 b}
P=2 a paused={2 a, 2 b} Uncovered: <2 a, 2 b> Paused={2 b} output covered: empty uncovered: same paused={2 b} execute 2 a
P=3 a Uncovered: output covered: empty uncovered: same paused={2 b} execute 3 a
P=4 a paused={4 a, 2 b} Uncovered: <2 a, 2 b> Paused={4 a} output covered: empty uncovered: same paused={4 a}
P=2 b paused={4 a, 2 b} Uncovered: <2 a, 2 b> Paused={4 a} output covered: {<2 a, 2 b>} uncovered: rest paused={4 a} execute 2 b
P=2 a paused={2 a} Uncovered: output covered: same uncovered: same paused={2 a}
P=2 b paused={4 a, 2 b} Uncovered: output Remove 2 b Execute 2 b covered: same uncovered: same paused={4 a}
Methodology • Types of software engineering research questions -- Methods and means of development. • Types of software engineering research results -- Procedure and technique. • Types of software engineering research validation -- Evaluation
Contributions • What is the contribution? What is new? -- Presents a new technique that aims at achieving high coverage faster for concurrent program. -- Implements the technique in a prototype tool. -- Shows the estimation-based heuristic contributes to the efficiency and effectiveness.
Take Home Ideas • The technique • The algorithm • The tool • The implementation method – Implement on the top of Cal. Fuzzer framework.
Experiments Bo Man
Goals • To evaluate our technique through a prototype tool in Java and performing several empirical studies with the tool on a number of Java subjects • 3 Steps – (1) Experimental setup – (2) Studies – (3) Threats to validity
Experimental setup Implementation • Take Cal. Fuzzer framework in JAVA • Modify both the instrumentation and the scheduling-controller modules and create new modules • Insert probes before every synchronization operation, shared-data access and thread-related-operation. • Run program once and generate SP coverage requirements and store in a file. • Take the file and execute program multiple times to achieve high SP
Experimental setup Subjects • Java Library: a set of classes extracted from package
Experimental setup Variables • independent variable is the threadschedule genation technique – TSA: thread-scheduling algorithm – TSA-h: TSA without Rule 3 – 15 varieties of the random testing technique – RND-y, insert yield() synchronization keyword at the shared resource accesses and synchronization operations. – RND-s 10, insert random delay up to 10 milliseconds with sleep(). – RND-s 100. • above s 100, effectiveness decreases.
Experimental setup Variables • dependent variables – the number of covered SP coverage requirements – the execution time to attain a certain goal – the number of feasible SP coverage requirements (compare with the first in study 3)
Studies and Results • 4 studies – effectiveness – efficiency – precision of estimation – impact of the estimation based heuristic
Study 1: Effectiveness • to investigate whether TSA achieves higher coverage than random testing. – (a) run estimation phase and for each subject create a set of SP coverage requirements. – (b) run TSA, each of the 15 random testing techniques 30 times – (c) for each run execute the program 500 times then calculate the average SP coverage requirements
Result of Study 1 • SP requirements: TSA ≥ MAX • RND-y < RND-s 100 < TSA • random testing : results vary a lot
Result of Study 1
Study 2 • to investigate the efficiency of the technique compared to random testing techniques. – (a) same as study 1 (a) (b) – (b) for each of the 30 runs, execute the program for 30 minutes and record the average of saturation point and number of covered SP coverage requirements.
Result of Study 2 • Saturation-based testing: threshhold
Result of Study 2 • TSA always reaches saturation point faster and covers a greater number of SP coverage requirements than random testing.
Study 3 • to investigate how precisely our technique estimates a set of SP coverage requirements – (a) same as study 1 (a) (b) – (b) take the union of the accumulated SP coverage requirements and compare it with the estimated SP requirements (in estimation phase)
Result of Study 3
Result of Study 3 • false positives – estimation technique is not precise enough to filter our infeasible coverage requirements • false positives: estimation technique is dynamic – one source is locks in a loop do not appear in an estimation but do appear in a testing execution – another source is aliasing problem. A lock statement appears in more than one
Study 4 • to investigate the impact of the estimation based heuristic on the efficiency of the testing phase. – same as Study 2, except replacing the 15 random testing techniques as TSA-h.
Result of Study 4 • the ratio of the application of Rule 3 over all in Algorithm one. • estimation based heuristics is the key asset of our
Results - Threats to Validity • threads to external validity: – programs not representative – solution: try to cover different kinds such as library classes and server applications • threads to internal validity: – unknown bugs in prototype – solution: build tool on top of the publicly available Cal. Fuzzer tool
Methodology • Variables control, reference groups, by comparison, through experiments • Large amount of repeated trials • All-around statistical analysis from tables and figures (in effectiveness, efficiency and precision)
Pros?
Pros • • • New thread scheduling technique to achieve high coverage in concurrent programs Prototype implemented in Java § 1910 lines of code modified § Publicly available Defined terms well Step-by-step instructions for thread-scheduling algorithm Thread-scheduling algorithm can be used by developers without varying any parameters Estimation-based heuristic improved performance.
Cons?
Cons • • Only 13 sample programs were used Scalability claim is not well-supported § Only one test case was over 20 K lines of code No computational complexity mentioned for thread scheduling algorithm Used their own implementation of random testing techniques instead of existing ones Numerical mistake when discussing Table 2/Figure 4 Issues with false positives and false negatives Bias towards estimation-based heuristic
Next Steps?
Next Steps • • • Investigate the relationship between coverage and faultdetection ability Test the thread scheduling algorithm on a greater variety of programs Extend the technique to satisfy other criteria besides synchronization-pair Fix issues with false positives and false negatives Show the computational complexity of the algorithm and whether or not it can be improved
- Synchronization algorithms and concurrent programming
- Indiana health coverage programs
- Is a high level synchronization construct
- Cpmcd full form
- Edge pair coverage
- Data coverage testing
- Coverage testing
- Statement coverage testing
- Prince william county middle school specialty programs
- Wellington high school choice programs
- Domain testing
- Logic based testing in software testing
- Data flow testing strategies in software testing
- Positive negative testing
- Cs 3250
- Globalization testing example
- Neighborhood integration testing
- What is testing
- Control structure testing in software testing
- Decision table testing in software testing
- Decision table testing examples
- Pengertian blackbox testing
- Behavior testing adalah
- Extended decision table
- Rigorous testing in software testing
- Testing blindness in software testing
- Component testing is a black box testing
- Types of domain testing
- Fast clock to slow clock synchronization
- Process synchronization in os
- Data synchronization in tally
- Multiprocessor synchronization
- What is lean synchronization
- Sh ip bgp summary
- Lock free synchronization
- Classical problems of synchronization in operating system
- Synchronization tools in operating system
- The bounded buffer problem in operating system
- Basic synchronization principles
- Windchill workspace synchronization
- Cuda thread synchronization
- Process synchronization means
- Time frequency domain
- Pthread synchronization
- Shared memory in unix
- Synchronization primitives c#
- Process synchronization definition
- Linux kernel synchronization
- Linux
- Synchronization in distributed systems
- Wait free synchronization
- Synchronization tools in operating system
- Deming chain reaction
- Parallel computer architecture cmu
- User123haru
- Dining philosophers problem using monitors
- Process synchronization
- Chia 3 threads
- External clock synchronization
- Cornell cs 4414
- Lamport bakery algorithm in distributed system