Is Spreadsheet Ambiguity Harmful Detecting and Repairing Spreadsheet
Is Spreadsheet Ambiguity Harmful? Detecting and Repairing Spreadsheet Smells due to Ambiguous Computation Wensheng Dou 1, Shing-Chi Cheung 2, Jun Wei 1 1 Institute of Software, Chinese Academy of Sciences 2 The Hong Kong University of Science and Technology
Motivating example o The spreadsheet contains incorrect formulas o Update on the incorrect formulas could cause faulty values in the spreadsheet 4→ 6 Should be 18 … a real example extracted from EUSES spreadsheet corp 2
Problems Screen shot of the spreadsheet before and after the change No warning is issued by Excel o Q 1: Which cells contain incorrect formulas? o Q 2: Which cells’ values are incorrect? 3
Key challenge - No oracle! o It is hard to identify which cells contain incorrect formulas or values o Require human judgments or specifications 4
Methodology Total Fruit = Apple + Orange Total Price = Total Fruit * Price Cell array o Cells are often grouped in a row or column with the same intended computation o We call this kind of group as a cell array 5
Methodology o The intended computation is ambiguous when not all the cells in a cell array follow the same formula pattern o The cell array suffers from ambiguous computation smells 6
Three smell types 18 o Ambiguous computation smells n Missing formula smells n Inconsistent formula smells n Conformance errors 7
How to get the intended computation? 8
Finding candidates from existing formulas = D i*Ei 9
Gaining confidence = D i*Ei Q: Is it likely the intended computation? A: Yes if it computes the values of the majority of cells 5 4 20 = D 6*E 6 10
Conformance error detection = D i*Ei Assumption: The values of cells are more likely correct than not 6 3 12 ≠ D 7*E 7 Likely an error 11
What if we find multiple formula patterns? o o = Bi , when Ci = 0 = Bi – Ci = Bi + C i = Ci , when Bi = 0 12
Synthesizing intended formula pattern o Adapt component-based program synthesis [1][2] to find the intended formula pattern n Constraints: Existing formula patterns, values o Key challenge n Cells with faulty formulas make program synthesis fail n We cannot distinguish faulty formulas from correct ones o Example n n = Bi , when Ci = 0 = B i – Ci = Bi + Ci = Ci , when Bi = 0 Which one should we use? [1] S. Jha, S. Gulwani, S. A. Seshia, and A. Tiwari. Oracle-guided component-based program synthesis. In ACM/IEEE 32 nd International Conference on Software Engineering (ICSE), pages 215– 224. 2010. [2] S. Gulwani, S. Jha, A. Tiwari, and R. Venkatesan, Synthesis of loop-free programs. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 62– 73. 2011. 13
Classify formulas into compatible groups o A compatible group always leads to a possible synthesized formula pattern n Group 1 o = Bi , when Ci = 0 o = Bi + Ci o = Ci , when Bi = 0 = B i+C i n Group 2 o = Bi , when Ci = 0 o = B i - Ci = Bi-C i 14
Tool implementation o Am. Check n Apache POI library – Manipulate spreadsheets n Annotate the smells in the resulted spreadsheets 15
Evaluation o RQ 1: How common are ambiguous computation smells in real-life spreadsheets? o RQ 2: Can Am. Check detect and repair ambiguous computation smells precisely? o RQ 3: Do end users find Am. Check useful for improving the quality of their spreadsheets? o RQ 4: Are ambiguous computation smells harmful? Experiment 1 Subject: EUSES Method: Manually validate by ourselves Experiment 2 Subject: 10 real-life spreadsheets Method: Interview with users 16
How common? (RQ 1) o 44. 7% of the spreadsheets with cell arrays suffer from ambiguous computation smells Category cs 101 Spreadsheets with cell arrays Spreadsheet with smelly cell arrays (CA) (SCA) 7 3 SCA / CA 42. 9% database 103 56 54. 4% filby 0 0 n. a. financial 245 126 51. 4% forms 3 10 4 40. 0% grades 201 88 43. 8% homework 163 54 33. 1% inventory 173 75 43. 4% jackson 0 0 n. a. modeling 88 38 43. 2% personal 3 0 0% Total 993 444 44. 7% 17
Is Am. Check precise? (RQ 2) o Coverage gives the percentage of cells that can be computed by the intended formula pattern n For coverage threshold of 80%, experimental precision is 73. 7% o Am. Check fixes 316 out of 319 true smells o Excel only detects 12 out of 319 true smells Coverage Sampled smells True smells Fixed smells Detected by Excel 100% 100 95 95 2 [90%, 100%) 100 73 73 7 [80%, 90%) 100 53 52 3 [70%, 80%) 100 46 46 0 [60%, 70%) 100 38 36 0 [50%, 60%) 100 9 9 0 [0%, 50%) 100 5 5 0 Total 700 319 316 12
Experiment 2 : Set up o Ten real-life spreadsheets prepared by professional finance officers for research project budget n Are the smells common? n Do they contain conformance errors? o Interview three officers who have participated in maintaining these spreadsheets n Are the smells indeed problems? n What are the causes of the smells? 19
Overview result o Ambiguous computation smells are common in financial spreadsheets, too. n 50 smelly cell arrays are confirmed n 20 conformance errors are confirmed ID Cell arrays Smelly arrays (Confirmed) 1 2 3 4 5 6 7 8 9 10 12 24 16 32 32 32 10 32 50 29 0 (0) 8 (8) 20 (20) 3 (3) 1 (0) 3 (3) 5 (3) 12 (10) Errors (Confirmed ) 0 (0) 4 (4) 8 (8) 0 (0) 1 (0) 0 (0) 1 (1) 9 (7) Findings Officers happily accepted our fixes even for cells with correct values. (Useful) 20
Causes of missing formula smells o Carelessly ignore necessary computation n Copy data from other cells, and miss to check the computations n Fix “division by zero” error by setting a cell’s value to 0 n Put down values instead of formulas to make things work quickly 3 ->4 =Bi * Ci / 10000 4 ->3 22 ->23 Make the final result correct 21
Causes of inconsistent formula smells o Carelessly copy formulas or ignore auto-fill feature n Copy formulas from other cells, without noticing errors n Manually write formulas, rather than auto-fill feature Where is B 3? 22
Summary Ad-hoc modification introduces computation smells The cells in a cell array have the same computational semantics o Evaluate on EUSES & reallife spreadsheets o Ambiguous computation smells are common and harmful Evaluation Ambiguous computation smell detection and
THANK YOU!
- Slides: 24