Bhanu Pratap Gupta Devang Vira S Sudarshan XData
Bhanu Pratap Gupta Devang Vira S. Sudarshan X-Data: Test Data Generation for Killing SQL Mutants Dept. of Computer Science and Engineering, IIT Bombay
Testing SQL Queries: A Challenge �Complex SQL queries hard to get right �Question: How to check if an SQL query is correct? Formal verification is not applicable since we do not have a separate specification and an implementation State of the art solution: Generate test databases and check if the query gives the intended result 2
Generating Test Data: Prior Work Automated Test Data generation Based on database constraints, and SQL query ▪ Agenda [Chays et al. , STVR 04], a tool which generates test cases for database applications which additionally uses user fed heuristics Ensuring query result is not empty ▪ Reverse Query Processing [Binning et al. , ICDE 07] takes desired query output and generates relation instances ▪ Handle a subset of Select/Project/Join/Group. By queries � None of the above guarantee anything about detecting errors in SQL queries � Question: How do you model SQL errors? � Answer: Query Mutation � 3
Mutation Testing � Mutant: Variation of the given query Mutations model common programming errors, like ▪ Join used instead of outerjoin (or vice versa) ▪ Join/selection condition errors ▪ < vs. <=, missing or extra condition ▪ Wrong aggregate (min vs. max) Mutant may be the intended query 4
Mutation Testing of SQL Queries Traditional use of mutation testing has been to check coverage of dataset Generate mutants of the original program by modifying the program in a controlled manner A dataset kills a mutant if query and the mutant give different results on the dataset A dataset is considered complete if it can kill all non-equivalent mutants of the given query � Prior work: Tuya and Suarez-Cabal [IST 07], Chan et al. [QSIC 05] defined a class of SQL query mutations Shortcoming: do not address test data generation � Our goal: generated dataset for testing query Test dataset and query result on the dataset are shown to human, who verifies that the query result is what is expected given this dataset Note that we do not need to actually generate and execute mutants � 5
Our Contributions � Address the problem of test data generation for killing non-equivalent mutants Equivalent Mutants: r(A, B) s(B, C) and r(A, B) s(B, C) where r. B is a foreign key to s, and is not null will always produce the same resultset � Define class of: Join/outerjoin mutations Selection predicate mutations � Algorithm for test data generation that kills all non- equivalent mutants in above class Under some simplifying assumptions (given in the paper) With the guarantee that generated datasets are small and realistic, to aid in human verification of results 6
Join Type Mutation � Join type mutations: An occurrence of a join operator ( , , , ) is replaced by one of the other join operators � Defining join mutations in SQL is complicated by the absence of a particular join order SELECT * FROM a, b, c WHERE (a. x = b. x) and (b. x = c. x) � We consider all relational algebra expressions (trees) equivalent (under inner join reordering) to the given SQL query � We consider join type mutations to single join nodes in each tree above 7
Killing Join Mutants: For Given Tree � Case I: Mutation at root node, with no foreign key constraints Schema: r(A), s(B) � To kill this mutant: ensure that for an r tuple there is no matching s tuple � Generated test case: r(A)={(1)}; s(B)={} � Basic idea: (a) run query on given database, (b) from result extract matching tuples for r and s (c) delete s tuple to ensure no matching tuple for r 8
Killing Join Mutants: For Given Tree � Case II: Extra join above mutated node Schema: r(A, B), s(C, D), t(E) � To kill this mutant we must ensure that for an r tuple there is no matching s tuple, but there is a matching t tuple � Generated test case: r(A, B)={(1, 2)}; s(C, D)={}; t(E)={(2)} 9
Killing Join Mutants: Across All Trees � Given join expression on relations r 1, r 2, …, rn Create dataset where all relations have a set of matching tuples For each relation ri, generate a dataset where rest of relations match, but ri is empty ▪ Unless making ri empty makes join graph disconnected � Above procedure kills all join type mutations of given inner join tree Outer joins complicate picture when attributes are projected out ▪ May have to make more than one ri empty at a time Foreign keys may prevent making some ri empty 10
Killing Join Mutants: Handling Foreign Keys � Case III: Mutation at root node with foreign key constraints and selection on right side Schema: r(A), s(B, C) Foreign key: r. A →s. B � � To kill this mutant we must create an s tuple which matches with the r tuple on the foreign key reference, but which has s. C ≠ 4 Generated test case: r(A)={(2)}; s(B, C)={(2, 5)} Notion of valid nullable pattern defined in paper specifies which relations can be made null/non-matching, given foreign key constraints and join graph 11
Implementation �Implemented using Java and Postgre. SQL Creates datasets by extracting and modifying tuples from given database Currently handles join type mutation and selection predicate mutation �For creating a merged dataset ▪ Tuples having same values for join attributes must be blocked from being inserted again �Handling selection predicate mutation ▪ Eg. to distinguish r. A < 3 and r. A <= 3 we generate tuples with r. A = 2 and 3 12
Ongoing and Future Work �Ongoing work : Synthetic data generation taking database and query constraints into account which is non trivial ▪ Idea (from RQP [Binning et al ICDE 07]): Use a model checker to generate data ▪ Under implementation using CVC 3 Extend the technique to handle aggregations and sub-queries �Future work: data generation for application code with multiple queries 13
Questions Thank You
Query Containment � Problem: is Q equivalent to a mutant Q‘ can be reduced to query containment and vice versa in polynomial time � The Chase algorithm can be used to generate datasets to show that Q and Q' are not equivalent (for SPJ queries and several extensions) such a dataset would kill the mutant Q‘ limited work on outerjoin containment data generation � However we don't want to enumerate each mutant and generate separate datasets too expensive 15
Exclusion Pattern �Under the following conditions we can generate merged datasets: Tuples having same values for join attributes must be blocked from being inserted again The query must not contain any equality selection on an unique key The result of the query must contain one or more attributes which together form an unique key for any relation Also attributes from the result forming an unique key must be guaranteed to be non-null in the result 16
Example Query � Consider the three relations : Student(name, deptcode, progcode), Department(deptcode, deptname) Program(progcode, progname) � And a query: SELECT rollno, name, deptname, progname FROM student s INNER JOIN department d ON s. deptcode=d. deptcode INNER JOIN program p ON s. progcode=p. progcode 17
Query Trees Equivalent to the Query Tree 1 Query Tree 2 Query Tree 3 �Generate mutants by mutating join operator of a single node for all above trees 18
Generated Test Data : considering no foreign keys Program Department Student Progcode Progname Deptcode Deptname Rollno 0 B. Tech CS Computer 501 1 M. Tech CH Chemical 2 Ph. D ME Mechanical Name progcode deptcode Devang 1 CS 401 Abhijeet 0 CE 701 Sandeep 5 CH 101 Aditya 4 MA Generated data shows : � A student (Devang) with valid program and department � A student (Abhijeet) with invalid department � A student (Sandeep) with invalid program � A student (Aditya) with invalid program and department � A program (Ph. D) with no student � A department (Mechanical) with no student 19
Generated Test Data : considering foreign keys Program Department Student Progcode Progname Deptcode Deptname Rollno Name progcode deptcode 0 B. Tech CS Computer 501 Devang 1 CS 1 M. Tech EE Electrical Foreign Keys are: Student. progcode → Program. progcode Student. deptcode → Department. deptcode Generated data shows : � A student (Devang) with valid program and department � A program (B. Tech) with no student � A department (Electrical) with no student 20
Results of Generated Test Data on Mutants Case of no foreign keys 21
- Slides: 21