Research Report Design and Evaluation of Incremental Data

  • Slides: 35
Download presentation
Research Report Design and Evaluation of Incremental Data Structures and Algorithms for Dynamic Query

Research Report Design and Evaluation of Incremental Data Structures and Algorithms for Dynamic Query Interfaces 8 May 2001 info vis - spr 2001

Institution & Authors University of Maryland l Department of Computer Science l Human-Computer Interaction

Institution & Authors University of Maryland l Department of Computer Science l Human-Computer Interaction Laboratory l http: //www. cs. umd. edu/projects/hcil Authors Egemen Tanin l Richard Beigel l Ben Shneiderman l 8 May 2001 info vis - spr 2001 2

Abstract Dynamic query interfaces (DQIs) are a recently developed database access mechanism that provides

Abstract Dynamic query interfaces (DQIs) are a recently developed database access mechanism that provides continuous real-time feedback to the user during query formulation. Previous work shows that DQIs are an elegant and powerful interface to small databases. Unfortunately, when applied to large databases, previous DQI algorithms slow to a crawl. We present a new incremental approach to DQI algorithms and display updates that work well with large databases, both in theory and in practice. 8 May 2001 info vis - spr 2001 3

Outline & Keywords Outline 1. 2. 3. 4. 5. 6. 7. Keywords Dynamic Querying

Outline & Keywords Outline 1. 2. 3. 4. 5. 6. 7. Keywords Dynamic Querying The Incremental Approach Data Structures & Algorithms Theoretical Complexity Experiments Conclusions Future Work 8 May 2001 l l l l info vis - spr 2001 Data Structure Algorithm Database User Interface Information Visualization Direct Manipulation Dynamic Query 4

Dynamic Querying Unlike textual query languages such as SQL, dynamic query interfaces (DQIs) are

Dynamic Querying Unlike textual query languages such as SQL, dynamic query interfaces (DQIs) are graphical Continuous feedback to user as the query is being formulated Tightly coupled: as the hit set varies all widgets are updated to show the hit set’s bounding rectangle “Details on demand” 8 May 2001 Query Input: Widgets l l Range sliders Alphanumeric sliders Toggles Checkboxes Query Output: Graphical Display l l l info vis - spr 2001 Starfield Bars Charts 5

8 May 2001 info vis - spr 2001 6

8 May 2001 info vis - spr 2001 6

8 May 2001 info vis - spr 2001 7

8 May 2001 info vis - spr 2001 7

The Incremental Approach In DQIs l l Queries formed incrementally Intermediate result visualization l

The Incremental Approach In DQIs l l Queries formed incrementally Intermediate result visualization l l l 8 May 2001 Sliders Conjunctions Constraints Efficient algorithms and supporting data structures required Incremental query formulation paradigm info vis - spr 2001 8

Incremental Approach Innovations for Efficiency Active Subset l Reprocessing Of limited size, stored in

Incremental Approach Innovations for Efficiency Active Subset l Reprocessing Of limited size, stored in main memory l Time, not space is constraint in DQI algorithms Augment active subset with data structures that facilitate continuous querying l 8 May 2001 Auxiliary data structures are only reconstructed when user clicks on a widget l Auxiliary data structures l l 1 second (or less) delay for recomputation Incremental Display l Response time needed of ~0. 1 seconds for continuous operations info vis - spr 2001 Slight changes in query tend to cause slight changes in output l Compute and display differences for continuous updates 9

Data Structures & Algorithms i Control Structure Query Previewer l l DQI algorithms to

Data Structures & Algorithms i Control Structure Query Previewer l l DQI algorithms to be used in tandem with a query previewer Allows user to browse a huge database and select a manageably small subset to scan 8 May 2001 l l l info vis - spr 2001 Selected subset passed from query previewer to DQI Bounding rectangle determines extremes for each attribute When user action extends beyond subset, control is passed back 10

8 May 2001 info vis - spr 2001 11

8 May 2001 info vis - spr 2001 11

Data Structures & Algorithms ii Major Operations l l l Setup Selection Querying Setup

Data Structures & Algorithms ii Major Operations l l l Setup Selection Querying Setup l l 8 May 2001 info vis - spr 2001 when query previewer passes control to DQI Initial display is generated from active subset Copies and scales each attribute to the range [1, p] where p is number of pixels in attribute’s range slider Happens infrequently, more time allotted for this task 12

Data Structures & Algorithms iii Selection l l When user clicks on a range

Data Structures & Algorithms iii Selection l l When user clicks on a range slider Algorithm calculates auxiliary data structures l l Querying Depends on currently selected attribute and current ranges for other attributes 8 May 2001 l Occurs continuously as the user drags the mouse to update a slider l Each changed mouse position is a single query ~ 0. 1 second response required l DQI computes maximum hit set during selection DQI computes information needed for redisplay during selection info vis - spr 2001 13 1 second response required or users get annoyed l l l Can spend memory to optimize, if needed

Data Structures & Algorithms iv Compute Max Hit Set l l Determined by extreme

Data Structures & Algorithms iv Compute Max Hit Set l l Determined by extreme values for attribute and current ranges for other attributes Partitions max hit set into p buckets, one for each user-specificable value for the current attribute Store each bucket and all left-to-right partial sums of these sizes Linear-time counting sort 8 May 2001 info vis - spr 2001 14

Data Structures & Algorithms v Compute Redisplay Info l l l Facilitate computation of

Data Structures & Algorithms v Compute Redisplay Info l l l Facilitate computation of histogram and tight coupling of range sliders Two dimensional array (size p 2) maintained Allows determination of ranges for all other sliders in constant time Scan buckets for old value vs. new value Display update in time linear with number of changes 8 May 2001 info vis - spr 2001 15

Theoretical Complexity i r = # records in active subset a = # attributes

Theoretical Complexity i r = # records in active subset a = # attributes b = # bytes needed to store value of single attribute p = length in pixels of each range slider f = area in pixels of the starfield u = average # pixels to be updated in starfield per query (non-trivial dependencies) m = # records in max hit set 8 May 2001 Active Subset (r a b) bytes Rescaled Active Subset O(r a) bytes Bucket Partition O(r a) bytes Data structures for tight coupling O(a p) bytes Data structures range histograms O(a p 2) bytes Starfield f bytes info vis - spr 2001 16

Theoretical Complexity ii r = # records in active subset a = # attributes

Theoretical Complexity ii r = # records in active subset a = # attributes b = # bytes needed to store value of single attribute p = length in pixels of each range slider f = area in pixels of the starfield u = average # pixels to be updated in starfield per query (non-trivial dependencies) m = # records in max hit set 8 May 2001 Setup Time O (r a b) info vis - spr 2001 17

Theoretical Complexity iii r = # records in active subset a = # attributes

Theoretical Complexity iii r = # records in active subset a = # attributes b = # bytes needed to store value of single attribute p = length in pixels of each range slider f = area in pixels of the starfield u = average # pixels to be updated in starfield per query (non-trivial dependencies) m = # records in max hit set 1. 2. 3. 4. 5. 1. 8 May 2001 Selection Time Determine max hit set O (r a) Sort max hit set O (m) Compute auxiliary data structures for tight coupling O (a p + m a) Compute auxiliary data structures for histograms O (a p 2 + m a) Total Selection Time O (a (r + m + p 2 )) = O (a (r + p 2 )) info vis - spr 2001 18

Theoretical Complexity iii r = # records in active subset a = # attributes

Theoretical Complexity iii r = # records in active subset a = # attributes b = # bytes needed to store value of single attribute p = length in pixels of each range slider f = area in pixels of the starfield u = average # pixels to be updated in starfield per query (non-trivial dependencies) m = # records in max hit set 8 May 2001 1. Querying Time Tight coupling O (a ) 2. Computing Histograms O (a p) 3. Starfield Update O (u ) Total Querying Time 1. O (a p + u ) info vis - spr 2001 19

Experiments i Preliminary experiments show that the incremental approach can deal with active subset

Experiments i Preliminary experiments show that the incremental approach can deal with active subset of 100, 000 records with 10 attributes each Film Finder could handle database 10, 000 records with 10 attributes 8 May 2001 Improvement by one order magnitude 1. 2. 3. Experimental Environment Experiments and Results Experimental Complexity info vis - spr 2001 20

Experiments ii Experimental Environment Sample DQI with range sliders l l l Starfield display,

Experiments ii Experimental Environment Sample DQI with range sliders l l l Starfield display, preview bar, range sliders Variable sizes for display, point size and sliders SUN SPARC Station 5 l l l 32 MB ram UNIX os Motif and C Batch Processing 8 May 2001 Measurements l l Setup, Selection, Querying, File read, Data structure setup, Sub-selection, Subsetup Repeated without graphics for ‘pure’ query time Varied the following: l Total attributes, total records, starfield size, point sizes on starfield, range slider sizes and jump sizes Worst Case Analysis info vis - spr 2001 21

Experiments iii Experiments and Results 1. 7200 runs 1. 2, 4, 6, 8 or

Experiments iii Experiments and Results 1. 7200 runs 1. 2, 4, 6, 8 or 10 3600 with starfield display and preview bar disabled 3600 with starfield display and preview bar 1 st 3600: Pure query no more than 20 milliseconds (average 10 milliseconds) faster 1. Time to update internal data structures negligible compared to starfield update times 8 May 2001 Attributes (a) 2. Starfield Size (f) 4002, 5002 or 6002 pixels 3. Point Size (d) 12, 32, 52 or 72 pixels 4. Range Slider Size (p) 150, 200, 250 pixels 5. Dataset Size (r) 10, 000, 25, 000, 50, 000, 75, 000 or 100, 000 records 6. Jump Size/range slider size (j/p) 1/50, 1/25, 1/10, or 1/5 info vis - spr 2001 22

Experiments iv Experimental Complexity analysis based on 2 nd 3600 queries with starfield display

Experiments iv Experimental Complexity analysis based on 2 nd 3600 queries with starfield display and preview bar enabled Ideally, experimental results will equate with theoretical prediction l Predicted run-time to hopefully equal experimental run-time 8 May 2001 Multiple linear regression used to generate best fit line from data l Error compensation l l Experimental error Algorithmic error Setup Selection Querying info vis - spr 2001 23

Experiments v 8 May 2001 info vis - spr 2001 24

Experiments v 8 May 2001 info vis - spr 2001 24

Experiments vi Evaluation 1. X 2 test to assess correlation between experiments and theoretical

Experiments vi Evaluation 1. X 2 test to assess correlation between experiments and theoretical terms l 2. Actual values for Setup, Selection, Query highly correlated with estimated values Predictive test to see if future outcomes can be estimated based on past experiment l l l Setup deviation 9. 5% Selection deviation: 3. 97% Querying deviation: 16. 63% 8 May 2001 Discussion 1. Incremental approach achieved better querying and display times than previous implementations using standard data structures 2. Consumed less memory 1. 3. 4. Data structures created when needed Preview bar, histogram, tight coupling info without making additional queries or spending additional processor time Memory is secondary problem compared to starfield updates which is significant for large r info vis - spr 2001 25

Conclusions & Future Work Conclusions Future Work The new incremental approach for queries and

Conclusions & Future Work Conclusions Future Work The new incremental approach for queries and display updates introduces a better way of dealing with large databases. Experiments show this approach is faster than previous approaches and can deal with an order of magnitude of larger datasets. The querying time is dominated by the starfield update time. The incremental approach enables faster display because only the difference between consecutive queries is updated in the data structures and on the starfield display. 8 May 2001 l l info vis - spr 2001 Make another order of magnitude increase in size of datasets that DQIs can handle Implement other widgets Try spatial data structures like K -D tree Combine DQIs with query previewer to produce a new state of the art in interactive dynamic database access 26

8 May 2001 info vis - spr 2001 27

8 May 2001 info vis - spr 2001 27

Data Visualization Sliders AT&T Bell Laboratories Stephen G. Eick 8 May 2001 info vis

Data Visualization Sliders AT&T Bell Laboratories Stephen G. Eick 8 May 2001 info vis - spr 2001

Abstract Computer sliders are a generic user input mechanism for specifying a numeric value

Abstract Computer sliders are a generic user input mechanism for specifying a numeric value from a range. For data visualization, the effectiveness of sliders may be increased by using the space inside the slider as l l l An interactive color scale, A barplot for discrete data, and A density plot for continuous data. The idea is to show the selected values in relation to the data and its distribution. Furthermore, the selection mechanism may be generalized using a painting metaphor to specify arbitrarily, disconnected intervals while maintaining an intuitive user-interface. 8 May 2001 info vis - spr 2001 29

Outline & Keywords Outline l l l Introduction Data Visualization Sliders Summary Keywords l

Outline & Keywords Outline l l l Introduction Data Visualization Sliders Summary Keywords l l l 8 May 2001 info vis - spr 2001 High Interaction Thresholding Information Visualization Selection Dynamic Graphics 30

Introduction Sliders are a generalpurpose user input mechanism to specify an input from a

Introduction Sliders are a generalpurpose user input mechanism to specify an input from a well-defined range Sliders are easy to use, intuitive, and provide a sensitive mechanism for specifying values. Sliders have a threshold bar positioned within a scale that the user manipulates with a mouse to select a value. 8 May 2001 There are many slider or slider-like applications Common idea is filtering. The pruning of visual clutter from data-rich displays by adjusting sliders is particularly effective in information visualization, and even more so when done dynamically. info vis - spr 2001 31

Data Visualization Sliders Eick improves sliders - use of internal slider space l l

Data Visualization Sliders Eick improves sliders - use of internal slider space l l For color scale For data values l l l ‘tick marks’ Density plot or bar length ‘painting metaphor’ 8 May 2001 l l l Distribution of data l Pop-up menu options l l l info vis - spr 2001 Toggle slider views Color rescale for increased color fidelity Range zooming for increased scale sensitivity Animation Labeling to print statistic values Interactive partition adjustment 32

Data Visualization Sliders A: combines visualization color scale and slider with interaction techniques B:

Data Visualization Sliders A: combines visualization color scale and slider with interaction techniques B: extends A by showing smooth distribution C: maps info to one color coded bar D: generalizes C to encode info in bar’s length 8 May 2001 info vis - spr 2001 33

Summary Generalizes the generic functionality of sliders along several orthogonal directions User can specify

Summary Generalizes the generic functionality of sliders along several orthogonal directions User can specify disconnected intervals while preserving intuitive slider interface Internal space as a color scale Interactively rebinding colors to active bars Adjusting color divisions 8 May 2001 Presenting distribution of data Showing individual data values Move between representations under user control Linking sliders to data they control suggests many natural and obvious extensions info vis - spr 2001 34

the end 8 May 2001 info vis - spr 2001 35

the end 8 May 2001 info vis - spr 2001 35