Spreadsheet Visualisation to Improve Enduser Understanding Daniel Ballinger
Spreadsheet Visualisation to Improve End-user Understanding Daniel Ballinger, Robert Biddle and James Noble School of Mathematical and Computing Sciences Email: {db, robert, kjx}@mcs. vuw. ac. nz http: //www. mcs. vuw. ac. nz/~db/honours. html
Motivation Spreadsheets are a common form of end-user programming. Unfamiliar spreadsheets can contain daunting amounts of information in the layout and intercell dependencies. Visualisation can be used to aid in end-user understanding of spreadsheets. Working outside the spreadsheet application allows for greater flexibility in visualisation. We focused on Microsoft Excel due to its large market share.
Excel’s Current Visualisation Support – Range Finder Invoked by clicking in a cell and then in the formula bar. Components are coloured in the bar and sheet. Allows for visual manipulation. Mainly only useful for spatially close cells. Range Finder
Excel’s Current Visualisation Support – Formula Auditing Tools Invoked using Formula Auditing Toolbar. Trace dependents or precedents. Arrows always point to referenced cell. Users may navigate spatially disjoint cells. (semantic navigation) Complicated spreadsheets can create a tangle of arrows.
Related Work Takeo Igarashi – Spreadsheets augment “a visible tabular layout with invisible formulas”. – Created visualisations to help reveal the hidden dataflow graphs and superficial tabular layouts of spreadsheets. Markus Clermont – Most end-users are not trained programmers. – Many spreadsheets exist beyond being simple scratch pads. Raymond Panko – Studies of empirical data into spreadsheet errors. – Found error rates can be disturbingly high. – Errors attributed to over confidence and lack of formal checking. Margaret Burnett – The importance of scalability in visualisations. – Forms/3 and an embedded testing methodology.
Spreadsheet Application Toolkit Query Find and store spreadsheets from the Internet. URLs Extract low level structures. E. g. Cell values and formulas. Analyse spreadsheet XLS files structures. Either individual or corpus. Toolkit Files Conveying the findings through visualisation. Gobbler Google Fetcher Web Servers Extractor BIFF Reader Algorithms Analyser Processed Metrics Data Visualisation Tools Image
Visualisations Spreadsheet layout Clustering Data Dependency Flow Data Dependency Direction Graph Structure Fisheye view Formula Inspection Corpus Analysis
Spreadsheet layout – Real-estate Utilisation 2 D Understanding layout is an important first step in learning about a new spreadsheet. Actual values and formulas are only shown as occupied cells. The visualisation layout mimics that of Excel, with columns along the top of the x-axis and rows running down the y-axis. Cells with a higher occupancy level are coloured towards the red end of the colour spectrum.
Spreadsheet layout – Real-estate Utilisation 3 D Occupancy data is projected into 3 D to create a surface map. Discrete to continuous data transformation helps smooth the effects of spikes. Coloured to give a Topographical terrain effect. Full benefit is seen with user interaction.
BIRCH Clustering BIRCH clustering partitions records into clusters that are similar according to two or more attributes. Current visualisations use the Euclidean distance between cells as the similarity metric.
Data Dependency Flow Visualising just average unit vectors for each cell can reduce the visual clutter. 3 D can be used to separate vectors that occur at different sheet levels. Note the curvature back towards the origin for this workbook.
Data Dependency Direction Concentrate purely on the directions of data flow relative to cells. Angles are sorted into 36 buckets then feed to Excel to create the graph. After the four main axis the next significant measure occurs between 300 and 360º. Radar Graph for Outgoing Dependencies
Graph Structure Disregarding spatial bounds allows some structures to become clearer. Source Data Spring view
Fisheye view - Focus + Context Addresses formula dependencies that span large distances or are many cells deep. Trees are warped over a hyperbolic lens to achieve both focus in the centre and context. An artificial red root node is introduced to connect disjoint trees.
Formula Inspection – Data Flow Visualising formula components and flow direction. Fully trace worksheets in one view. Basic Referencing Components
Formula Inspection - Dependency Types Row Absolute Excel allows for Fully Absolute combinations of relative and absolute positioning. Column Absolute Relative Understanding the referencing type is important when replicating formula and identifying regular patterns.
Corpus Analysis of 259 Workbooks Spatial Centre Demonstrations of visualisations created from a corpus. With this sample corpus the average worksheet centre is more column centric. Function utilisation suggests that the logical functions, such as IF, actually outnumber simpler math functions like SUM. Number of non-empty Worksheets: 227 Number of empty Worksheets: 195 Average Row: 1. 348 Average Column 18. 098 Max Row: 1384 Max Col: 82 Total Occupied cells: 55491 Orphans : 51570 Root Cells : 2105 Leafs: 1031 Nodes in Cyclic References: 29 Local Formula: 108 Family Trees: 509 Max Tree Depth: 22 Max Tree Breadth: 150 Function Utilisation
Summary Spreadsheets are significant examples of end-user programming Visualisation can assist end-users in better understanding the structure of spreadsheets – In particular, the “hidden structures” created by formula Reviewed literature to investigate the implications of the hidden structures. Developed a toolkit to externally access the spreadsheet structure and generate visualisations. Created several sample visualisations to help improve end-user understanding.
Current and Future Work Detailed user studies, including usability evaluations Domain specific visualisations Spreadsheet corpus analysis to find large patterns Visualisation scalability to larger more complex spreadsheets http: //www. mcs. vuw. ac. nz/~db/honours. html
- Slides: 19