CS 4723 Lecture 4 Test Coverage Test Coverage

  • Slides: 63
Download presentation
CS 4723: Lecture 4 Test Coverage

CS 4723: Lecture 4 Test Coverage

Test Coverage Ø Ø The most straightforward: input coverage Ø # of inputs tested

Test Coverage Ø Ø The most straightforward: input coverage Ø # of inputs tested / # of possible inputs Ø Ø 2 After we have done some testing, how do we know the testing is enough? Unfortunately, # of possible inputs is typically infinite Not feasible, so we need approximations…

Test Coverage 3 Ø Code Coverage Ø Input Combination Coverage Ø Specification Coverage Ø

Test Coverage 3 Ø Code Coverage Ø Input Combination Coverage Ø Specification Coverage Ø Mutation Coverage

Code Coverage Ø Basic idea: Ø Ø Ø So the test suite is definitely

Code Coverage Ø Basic idea: Ø Ø Ø So the test suite is definitely not sufficient Definition: Ø Ø 4 Bugs in the code that has never been executed will not be exposed Divide the code to elements Calculate the proportion of elements that are executed by the test suite

Control Flow Graph How many test cases to achieve full statement coverage? 5

Control Flow Graph How many test cases to achieve full statement coverage? 5

Statement Coverage in Practice Ø Ø Ø 6 Microsoft reports 80 -90% statement coverage

Statement Coverage in Practice Ø Ø Ø 6 Microsoft reports 80 -90% statement coverage Safely-critical software must achieve 100% statement coverage Usually about 85% coverage, 100% for large systems is usually very hard

Statement Coverage: Example 7

Statement Coverage: Example 7

Branch Coverage Ø Ø Ø 8 Cover the branches in a program A branch

Branch Coverage Ø Ø Ø 8 Cover the branches in a program A branch is consider executed when both (All) outcomes are executed Also called multiple-condition coveage

Control Flow Graph How many test cases to achieve full branch coverage? 9

Control Flow Graph How many test cases to achieve full branch coverage? 9

Branch Coverage: Example 10

Branch Coverage: Example 10

Branch Coverage: Example An untested flow of data from an assignment to a use

Branch Coverage: Example An untested flow of data from an assignment to a use of the assigned value, could hide an erroneous computation Even though we have 100% statement and branch coverage 11

Data Flow Coverage Ø Cover all def-use pairs in a software Ø Def: write

Data Flow Coverage Ø Cover all def-use pairs in a software Ø Def: write to a variable Ø Use: read of a variable Ø Use u and Def d are paired when d is the direct precursor of u in certain execution 12

Data Flow Coverage Ø Formula Ø Not easy to locate all use-def pairs Ø

Data Flow Coverage Ø Formula Ø Not easy to locate all use-def pairs Ø Easy for inner-procedure (inside a method) Ø Very difficult for inter-procedure Ø 13 Consider the write to a field var in one method, and the read to it in another method

Path coverage Ø 14 The strongest code coverage criterion Ø Try to cover all

Path coverage Ø 14 The strongest code coverage criterion Ø Try to cover all possible execution paths in a program Ø Covers all previous coverage criteria? Ø Usually not feasible Ø Exponential paths in acyclic programs Ø Infinite paths in some programs with loops

Path coverage Ø N conditions Þ 2 N paths Ø Many are not feasible

Path coverage Ø N conditions Þ 2 N paths Ø Many are not feasible Ø e. g. , L 1 L 2 L 3 L 4 L 6 X = 0 => L 1 L 2 L 3 L 4 L 5 L 6 X = -1 => L 1 L 3 L 4 L 6 X = -2 => L 1 L 3 L 4 L 5 L 6 15

Control Flow Graph How many paths? How many test cases to cover? 16

Control Flow Graph How many paths? How many test cases to cover? 16

Path coverage, not enough 1. main() { 2. int x, y, z, w; 3.

Path coverage, not enough 1. main() { 2. int x, y, z, w; 3. read(x); 4. read(y); 5. if (x != 0) 6. z = x + 10; 7. else 8. z = 1; 9. if (y>0) 10. w = y / z; 10. else 11. w = 0; 12. } 17 Test Requirements: – 4 paths • Test Cases – (x = 1, y = 22) – (x = 0, y = 10) – (x = 1, y = -22) – (x = 0, y = -10) • We are still not exposing the fault ! • Faulty if x = -10 – Structural coverage cannot reveal this error

Code Coverage Ø Questions Ø Ø Statement (basic block) coverage, are they the same?

Code Coverage Ø Questions Ø Ø Statement (basic block) coverage, are they the same? Branch coverage (cover all edges in a control flow graph), same with basic block coverage? If (x >3){ print …; }else{ exit; 18 }

Method coverage Ø So far, all examples are inner-method Ø Ø Ø 19 Quite

Method coverage Ø So far, all examples are inner-method Ø Ø Ø 19 Quite useful in unit testing It is very hard to achieve 100% statement coverage in system testing Ø Need higher level code element Ø Method coverage Similar to statements Ø Node coverage : method coverage Ø Edge coverage : method invocation coverage Ø Path coverage : stack trace coverage

Method coverage 20

Method coverage 20

Code coverage: summary Ø Coverage of code elements and their connections Ø Node coverage:

Code coverage: summary Ø Coverage of code elements and their connections Ø Node coverage: Ø Ø Edge coverage: Ø Ø Branch/Dataflow/Method. Invok Path coverage: Ø 21 Class/method/statement/predicate coverage Path/Use. Def. Chain/Stack. Trace

Code coverage: limitations Ø Not enough Ø Ø 22 Some bugs can not be

Code coverage: limitations Ø Not enough Ø Ø 22 Some bugs can not be revealed even with full path coverage Cannot reveal bugs due to missing code

Code coverage: practice Ø 23 Though not perfect, code coverage is the most widely

Code coverage: practice Ø 23 Though not perfect, code coverage is the most widely used technique for test evaluation Ø Also used for measure progress made in testing Ø The criteria used in practice are mainly: Ø Method coverage Ø Statement coverage Ø Branch coverage Ø Loop coverage with heuristic (0, 1, many)

Code coverage: practice Ø Far from perfect Ø Ø Ø A lot of corner

Code coverage: practice Ø Far from perfect Ø Ø Ø A lot of corner (they are not so corner if just not found by statement coverage) cases can never be found 100% code coverage is rarely achieved Ø Ø Ø 24 The commonly used criteria are the weakest, recall our examples Mature commercial software products released with 85% to 90% statement coverage Some commercial software products released with around 60% statement coverage Many open source software even lower than 50%

Input Combination Coverage Ø Basic idea Ø Ø 25 Origins from the most straightforward

Input Combination Coverage Ø Basic idea Ø Ø 25 Origins from the most straightforward idea In theory, proof of 100% correctness when achieve 100% coverage in theory In practice, on very trivial cases Main problems Ø Combinations are exponential Ø Possible values are infinite

Input Combination Coverage Ø An example on a simple automatic sales machine Ø Accept

Input Combination Coverage Ø An example on a simple automatic sales machine Ø Accept only 1$ bill once and all beverages are 1$ Ø Coke, Sprite, Juice, Water Ø Icy or normal temperature Ø Want receipt or not Ø All combinations = 4*2*2 = 16 combinations Ø 26 Try all 16 combinations will make sure the system works correctly

Input Combination Coverage Ø Sales Machine Example Input 1 Input 2 Input 3 Coke

Input Combination Coverage Ø Sales Machine Example Input 1 Input 2 Input 3 Coke Sprite Normal Receipt Juice Icy No-Receipt Water 27

Combination Explosion Ø Ø 28 Combinations are exponential to the number of inputs Consider

Combination Explosion Ø Ø 28 Combinations are exponential to the number of inputs Consider an annual tax report system with 50 yes/no questions to generate a customized form for you 250 combinations = about 1015 test cases Running 1000 test case for 1 second -> 30, 000 years

Observation Ø Ø 29 When there are many inputs, usually a relationship among inputs

Observation Ø Ø 29 When there are many inputs, usually a relationship among inputs usually involve only a small number of inputs The previous example: Maybe only icy coke and sprite, but receipt is independent

Example of Tax Report 30 Ø Input 1: Family combined report or Single report

Example of Tax Report 30 Ø Input 1: Family combined report or Single report Ø Input 2: Home loans or not Ø Input 3: Receive gift or not Ø Input 4: Age over 60 or not Ø … Ø Input 1 maybe related to all other inputs Ø Other inputs are independent of each other

Studies Ø A long term study from NIST (national institute of standardization technology) Ø

Studies Ø A long term study from NIST (national institute of standardization technology) Ø 31 A combination width of 4 to 6 is enough for detecting almost all errors

N-wise coverage Ø Ø 32 Coverage on N-wise combination of the possible values of

N-wise coverage Ø Ø 32 Coverage on N-wise combination of the possible values of all inputs Example: 2 -wise combinations Ø (coke, icy), (sprite, icy), (water, icy), (juice, icy) Ø (coke, normal), (sprite, normal), … Ø (coke, receipt), (sprite, receipt), … Ø (coke, no-receipt), (sprite, no-receipt), … Ø (icy, receipt), (normal, receipt) Ø (icy, no-receipt), (normal, no-receipt) Ø 20 combinations in total Ø We had 16 3 -wise combinations, now we have 20, get worse? ?

N-wise coverage Ø Note: One test case may cover multiple N-wise combinations Ø E.

N-wise coverage Ø Note: One test case may cover multiple N-wise combinations Ø E. g. , (Coke, Icy, Receipt) covers 3 2 -wise combinations Ø Ø Ø (Coke, Icy), (Coke, Receipt), (Icy, Receipt) 100% N-wise coverage will fully cover 100% (N-1)wise coverage, is this true? For K Boolean inputs Ø Full combination coverage = 2 k combinations: exponential Ø Full n-wise coverage = 2 n*k*(k-1)* … *(k-n+1)/n! combinations: polynomial, for 2 -wise combination, 2*k*(k-1) 33

N-wise coverage: Example Ø 34 How many test cases for 100% 2 -wise coverage

N-wise coverage: Example Ø 34 How many test cases for 100% 2 -wise coverage of our sales machine example? Ø (coke, icy, receipt), covers 3 new 2 -wise combinations Ø (sprite, icy, no-receipt), cover 3 new … Ø (juice, icy, receipt), covers 2 new … Ø (water, icy, receipt), covers 2 new … Ø (coke, normal, no-receipt), covers 3 new … Ø (sprite, normal, receipt), cover 3 new … Ø (juice, normal, no-receipt), covers 2 new … Ø (water, normal, no-receipt), covers 2 new … Ø 8 test cases covers all 20 2 -wise combinations

Combination Coverage in Practice Ø Ø 35 2 -wise combination coverage is very widely

Combination Coverage in Practice Ø Ø 35 2 -wise combination coverage is very widely used Ø Pair-wise testing Ø All pairs testing Mostly used in configuration testing Ø Example: configuration of gcc Ø All lot of variables Ø Several options for each variable Ø For command line tools: add or remove an option

Input model Ø What happened if an input has infinite possible values Ø Integer

Input model Ø What happened if an input has infinite possible values Ø Integer Ø Float Ø Character Ø String Ø Ø 36 Note: all these are actually finite, but the possible value set is too large, so that they are deemed as infinite Idea: map infinite values to finite value baskets (ranges)

Input model Ø Equivalent class partition Ø Ø Ø Partition the possible value set

Input model Ø Equivalent class partition Ø Ø Ø Partition the possible value set of a input to several value ranges Transform numeric variables (integer, float, double, character) to enumerated variables Example: Ø int exam_score => {less than -1}, {0, 59}, {60, 69}, {70, 79}, {80, 89}, {90, 100}, {100+} Ø 37 char c => {a, z}, {A, Z}, {0, 9}, {other}

Input model Ø Feature extraction Ø For string and structure inputs Ø Split the

Input model Ø Feature extraction Ø For string and structure inputs Ø Split the possible value set with a certain feature Ø Example: String passwd => {contains space}, {no space} Ø It is possible to extract multiple features from one input Ø Example: String name => {capitalized first letter}, {not} => {contains space}, {not} => {length >10}, {2 -10}, {1}, {0} One test case may cover multiple features 38

Input model Ø Feature extraction: structure input Ø A Word Binary Tree (Data at

Input model Ø Feature extraction: structure input Ø A Word Binary Tree (Data at all nodes are strings) Ø Depth : integer -> partition {0, 1, 1+} Ø Number of leaves : integer -> partition {0, 1, <10, 10+} Ø Root: null / not Ø A node with only left child / not Ø A node with only right child / not Ø Null value data on any node / not Ø Root value: string -> further feature extraction Ø 39 Ø Value on the left most leaf: string -> further feature extraction …

Input model Ø Infeasible feature combination? Ø Example: String name => {capitalized first letter},

Input model Ø Infeasible feature combination? Ø Example: String name => {capitalized first letter}, {not} => {contains space}, {not} => {length >10}, {2 -10}, {1}, {0} Length = 0 ^ contains space Length = 0 ^ capitalized first letter Length = 1 ^ contains space ^ capitalized first letter 40

Input combination coverage Ø Summary: Ø Try to cover the combination of possible values

Input combination coverage Ø Summary: Ø Try to cover the combination of possible values of inputs Ø Exponential combinations: Ø Ø 41 Ø N-wise coverage Ø 2 -wise coverage is most popular, all pairs testing Infinite possible values Ø Input partition Ø Input feature extraction Coverage is usually 100% once adopted Ø It is easy to achieve, compared with code coverage Ø Models are not easy to write

Specification Coverage Ø Ø Ø A type of input coverage Covers the written formal

Specification Coverage Ø Ø Ø A type of input coverage Covers the written formal specification in the requirement document Example Ø Ø Ø When a number smaller than 0 is fed in, the system should report error => testcase: -1 Sometimes can be a sequence of inputs When you input correct user name, a passwd prompt is shown, after you input the correct passwd, the user profile will be shown, … => testcase: xiaoyin, xxxxx, … 42

Specification Coverage Ø Widely used in industry Ø Advantages Ø Ø Target at the

Specification Coverage Ø Widely used in industry Ø Advantages Ø Ø Target at the specification Ø No need for writing oracles Ø Usually can achieve 100% coverage Disadvantages Ø Very hard to automate Ø 43 can only be automated with formal specifications Ø No guarantee to be complete Ø Quality highly depend on the specification

Test coverage Ø So far, covering inputs and code Ø The final goal of

Test coverage Ø So far, covering inputs and code Ø The final goal of testing Ø Ø Ø 44 Find all bugs in the software So there should be a bug coverage The coverage best represents the adequacy of a test suite Ø 50% bug coverage = half done! Ø 100% bug coverage = done!

But it is impossible Ø Bugs are unknown Ø Ø Ø Otherwise we do

But it is impossible Ø Bugs are unknown Ø Ø Ø Otherwise we do not need testing So we have the number of bugs found, we do not know what to divide One possible solution Ø Estimation Ø Ø Ø 45 1 -10 bugs in 1 KLOC Depends on the type of software and the stage of development, imprecise When you find many bugs, do you think all bugs are there or the code is really of low quality?

Mutation coverage Ø Ø Ø 46 How can we know how many bugs there

Mutation coverage Ø Ø Ø 46 How can we know how many bugs there are in the code? If only we plant those bugs! Mutation coverage checks the adequacy of a test suite by how many human-planted bugs it can expose

Concepts Ø Ø Mutant Ø A software version with planted bugs Ø Usually each

Concepts Ø Ø Mutant Ø A software version with planted bugs Ø Usually each mutant contains only one planted bug, why? Mutant Kill Ø Ø 47 Given a test suite S and a mutant m, if there is a test case t in S, so that execute(original, t) != execute(m, t), we state that S can kill m Basically, a test suite can kill a mutant, meaning that the test suite is able to detect the planted bug represented by the mutant

48 Illustration Original Oracles same Test Cases Mutant 1 Results different Mutant 2 Results

48 Illustration Original Oracles same Test Cases Mutant 1 Results different Mutant 2 Results . . . Mutant n Survived Results Killed

Concepts Ø 49 Mutation coverage

Concepts Ø 49 Mutation coverage

Mutant generation Ø 50 Traditional mutation operators Ø Statement deletion Ø Replace Boolean expression

Mutant generation Ø 50 Traditional mutation operators Ø Statement deletion Ø Replace Boolean expression with true/false Ø Replace arithmetic operators (+, -, *, /, …) Ø Replace comparison relations (>=, ==, <=, !=) Ø Replace variables Ø …

Mutation Example: Operator Mutant operator 51 In original In mutant Statement Deletion z=x*y+1; Boolean

Mutation Example: Operator Mutant operator 51 In original In mutant Statement Deletion z=x*y+1; Boolean expression to true | false if (x<y) if(true) If(false) Replace arithmetic operators z=x*y+1; z=x*y-1 z=x+y+1 Replace comparison operators if(x<y) if(x<=y) if(x==y) Replace variables z=x*y+1; z = z*y+1 z = x*x+1

Mutant generation Ø 52 Object-oriented mutation operators Ø Insert/Delete overriding method Ø Add/delete “this”

Mutant generation Ø 52 Object-oriented mutation operators Ø Insert/Delete overriding method Ø Add/delete “this” Ø Instantiation as child class Ø Cast to subtype Ø …

Mutation Example: Object-Oriented Ø Insert/Delete overriding method class Shape{ public void set. ID(String id){

Mutation Example: Object-Oriented Ø Insert/Delete overriding method class Shape{ public void set. ID(String id){ this. id = id; } public void draw(){. . . } } class Circle extends Shape{ public void draw(){. . . } } 53 class Shape{ public void set. ID(String id){ this. id = id; } } public void draw(){ protected void draw(){. . . } } class Circle extends Shape{ public void set. ID(String id){ } } public void draw(){. . . } }

Problems of mutation testing Ø Ø Large amount of time overhead Ø Need to

Problems of mutation testing Ø Ø Large amount of time overhead Ø Need to run the test suite over large number of mutants Ø Cause extra burden for collecting test coverage Equivalent mutants Ø 54 A mutant that will not affect the behavior of the software

Time overhead 55 Ø For n mutants, requires n times of overhead Ø How

Time overhead 55 Ø For n mutants, requires n times of overhead Ø How to reduce time overhead? Ø Reuse execution info Ø Early rule out Ø Mutants that are not covered Ø Mutants that cannot be killed

Reduce Time Overhead original m 1 m 2 int index = read; while (…)

Reduce Time Overhead original m 1 m 2 int index = read; while (…) { …; index++; if (index == 10) { break; } } return value > 0; 56 if (index == 10) { break; } if (index == 10) { return true; } m 3 if (index == 10) { break; } } return value < 0; } return value > 0; } return value +1 >0; reuse the program states before return statement If index reads 20, The mutant is not covered If value is not 0, nothing is changed

Equivalent mutants Ø Another main problem in mutation coverage is equivalent mutants Ø A

Equivalent mutants Ø Another main problem in mutation coverage is equivalent mutants Ø A mutant is an equivalent mutant if its semantics is identical with the original software int index = 0; while (…) { …; index++; => if (index == 10) { break; } 57 } if (index >= 10) { break; } }

Equivalent mutants Ø Another main problem in mutation coverage is equivalent mutants Ø Ø

Equivalent mutants Ø Another main problem in mutation coverage is equivalent mutants Ø Ø 58 Equivalent mutants cause mutation coverage to never reach 100% So you do not know whethere are too many equivalent mutants, or the test suite is not adequate

Reduce equivalent mutants Ø Using compiler optimization Ø Ø Check whether the compiled bytecode

Reduce equivalent mutants Ø Using compiler optimization Ø Ø Check whether the compiled bytecode is the same with the original software Ø Mutating dead code Ø Mutating unused variable After the mutation code, write a conditional path, and check whether the path is feasible //result = a + b; result = a - b; 59 //result = a + b; result = a - b; => if(a + b != a - b){ not equivalent; }

Mutant testing tools Ø MILU http: //www 0. cs. ucl. ac. uk/staff/Y. Jia/#tools Ø

Mutant testing tools Ø MILU http: //www 0. cs. ucl. ac. uk/staff/Y. Jia/#tools Ø Mu. Java http: //cs. gmu. edu/~offutt/mujava/ Ø Javalanche https: //github. com/david-schuler/javalanche/ 60

Summary on all coverage measures Ø 61 Code coverage Ø Target: code Ø Adequacy:

Summary on all coverage measures Ø 61 Code coverage Ø Target: code Ø Adequacy: no -> 100% code coverage != no bugs Ø Approximation: dataflow, branch, method/statements Ø Usability: medium (require code for instrumentation) Ø Preparation: none Ø Overhead: low (instrumentation cause some overhead)

Summary on all coverage measures Ø Input combination coverage Ø Target: inputs Ø Adequacy:

Summary on all coverage measures Ø Input combination coverage Ø Target: inputs Ø Adequacy: yes -> 100% input coverage == no bugs Ø 62 Approximation: n-wise coverage, input partition, input feature extraction Ø Usability: high Ø Preparation: hard (require equivalent class partition) Ø Overhead: none

Summary on all coverage measures Ø Mutation coverage Ø Target: bugs Ø Adequacy: no

Summary on all coverage measures Ø Mutation coverage Ø Target: bugs Ø Adequacy: no -> 100% mutant coverage != no bugs Ø Approximation: mutation is already approximation Ø Usability: medium (require code change for mutants) Ø Preparation: none Ø 63 Overhead: very high (execution on instrumented mutated versions)