Hierarchy Navigation Framework Supporting Scalable Interactive Exploration over

Hierarchy Navigation Framework: Supporting Scalable Interactive Exploration over Large Databases Nishant Mehta, Elke A. Rundensteiner and Matt Ward Computer Science Department Worcester Polytechnic Institute IDEAS’ 05 Thank you to NSF for several IDM grants for XMDV project. 1

Xmdv. Tool: Multivariate Data Visualization n Example 8 Cars Data Set 130 3504 MPG Cyli. HP Wt. 18 8 130 3504 17 8 132 3700 . . . 40 2 100 2500 18 Dataset with 4096 points in Xmdv. Tool 6. 0 Parallel coordinate display 3
![Hierarchical Displays [Fua: 99] Cars Data Set C I G H Base Data Points Hierarchical Displays [Fua: 99] Cars Data Set C I G H Base Data Points](http://slidetodoc.com/presentation_image_h2/d4754f842c9fc717d7fd937ed3e1d2a2/image-3.jpg)
Hierarchical Displays [Fua: 99] Cars Data Set C I G H Base Data Points B C 0 6 E 2 D G F 0 0 H Cyli. HP Wt. C 6 8 130 3504 D 4 8 132 3000 G 10 6 110 2800 H 12 2 70 2100 I 15 2 80 2200 J 15 2 100 2500 B 5 8 131 3252 F 12. 3 3. 33 86. 66 2366. 6 E 13 33 90 2400 A 10. 33 4. 66 103. 66 2684 D J A MPG 4 3 J I 0 0 0 Structure-based brush components: b- level of detail d- focus area e- focus extents 4

Hierarchical Displays 5

Problems: Hierarchical Display Achieved: n Screen space solution to clutter problem But n Data handing problem remains … n n n Cluster tree size greater than initial tree Cluster tree may not fit into main memory Structure based brush semantics involve recursive searches over cluster tree 6

Goal n Overall Goal: n n Scale hierarchical displays to support navigation over large hierarchies Subgoals : n Support navigation over large-scale persistent data n n n Store hierarchies on disk Map navigation operations to efficient queries Meet interactive response requirements 7

Overview of Approach Support navigation operations over large scale persistent data Hierarchy Encoding Spatial Indexing Caching Meet interactive response requirements Prefetching 8

Hierarchy Encoding Problem : Structure-based brush n Selection semantics involve recursive search n Recursive search over secondary storage is slow Solution: Hierarchy encoding n Push recursive processing into precomputation step n Precompute label for each node in hierarchy n Map recursive search to equivalent non-recursive one Labeling Database Hierarchical Data Hierarchy Encoding 9
![Structure-Based Brush Semantics [Fua: 99] Node selection based on 2 steps: n Horizontal Selection Structure-Based Brush Semantics [Fua: 99] Node selection based on 2 steps: n Horizontal Selection](http://slidetodoc.com/presentation_image_h2/d4754f842c9fc717d7fd937ed3e1d2a2/image-9.jpg)
Structure-Based Brush Semantics [Fua: 99] Node selection based on 2 steps: n Horizontal Selection n n Vertical Selection n A B C 0 E G 0 0 Level of detail (lod) 0. 6 0. 2 D Subtree (e 1, e 2) F 0 H 0 0. 4 J I 0 0 10

Horizontal Selection n Aim Select subtree that user is interested in viewing n Approach n n Brush focus extents (e 1, e 2), select set of base points. Propagate selection: select parent(n) if n is selected A B C 0 0. 6 E 0. 2 D F 0 G 0 J 0. 3 Selected Clusters Selected Leaves 0. 4 H 0 I 0 0 11 (e 1, e 2) = (2/6, 11/12) , lod=0. 4

Non-Recursive Horizontal Selection Offline n Precompute intervals for each node (hmin, hmax) n Interval of parent includes interval of child Online n Search for nodes that intersect brush interval (e 1, e 2) A 0. 6 (0, 1) B E 0. 2 (0, 2/6) C (0, 1/6) 0 0. 5 (2/6, 1) D 0 F (1/6, 2/6) G (2/6, 3/6) 0 H J 0. 3 (2/6, 5/6) 0 (3/6, 4/6) I 0 (5/6, 1) 0 (4/6, 5/6) (e 1, e 2) = (2/6, 11/12) , lod=0. 4 12

Vertical Selection n Aim n n Select points at desired lod (lod handle of SBB) Approach n Explore each branch starting at root to find node: n lod(n) <= lod(brush) A B C lod=0. 4 0 0. 6 E 0. 2 D G 0 F 0 H 0. 5 J 0. 3 I 0 0 0 SBB: (e 1, e 2) = (2/6, 11/12) , lod=0. 4 13

Non-Recursive Vertical Selection n Node n satisfies vertical selection criteria iff: 0. 5, 0. 6 F 0. 3 0 H 0 0, 0. 3 G 0 0. 3, 0. 5 D 0, 0. 3 0 0. 6 0. 2 0, 0. 2 C B 0, 0. 3 0, 0. 2, 0. 6, A E 0. 5 0, 0. 5 lod(n) <= lod(brush) < lod(parent(n)) vmin<= lod(brush) < vmax I J 0 0 Each nodelod(brush) n, has extents = 0. 4(vmin, vmax) SBB: (e 1, e 2) = (2/6, 11/12) , b=0. 4 14

Non-Recursive Selection n Selects all nodes that satisfy: n 0. 6, n hmin <= e 2 and hmax >= e 1 vmin <= lod(brush) < vmax A B (2/6, 3/6) F H (2/6, 1) (2/6, 5/6)` 0, 0. 3 G 0. 3, 0. 5 (1/6, 2/6) 0, 0. 3 (0, 1/6) D 0, 0. 2 C 0, 0. 3 0, 0. 2 (0, 2/6)` E (3/6, 4/6) 0, 0. 5, 0. 6 0. 2, 0. 6 (0, 1) J (5/6, 1) I (4/6, 5/6) SBB: (e 1, e 2) = (2/6, 11/12) , lod=0. 4 15

2 D Hierarchy Map 0. 6, Brush 1. 0 A 0. 6 0. 5, 0. 6 B 0. 5 (2/6, 1) (2/6, 3/6) F H (2/6, 5/6) 0, 0. 3 G 0. 3, 0. 5 (1/6, 2/6) 0, 0. 3 (0, 1/6) D 0, 0. 2 C E B lod 0, 0. 3 0, 0. 2 (0, 2/6) E (3/6, 4/6) 0, 0. 5 0. 2, 0. 6 (0, 1) A J 0. 3 (5/6, 1) 0. 2 (4/6, 5/6) J C I 0 F G D 1/6 e 1 H 3/6 I 4/6 5/6 SBB: (e 1, e 2) = (2/6, 11/12) , lod=0. 4 e 2 1 16

Properties of 2 D Hierarchy Map n n n Progressive Tree Structure Space Filling Non-Overlapping 1. 0 A 0. 6 E E 0. 5 B 0. 3 0. 2 G D 1/6 J F B C 0 F 2/6 H 3/6 I 4/6 5/6 1 17

Navigation operations in 2 D Hierarchy Map Brush 1. 0 A 0. 6 E 0. 5 B F J 0. 3 0. 2 C 0 G D 1/6 2/6 H 3/6 I 4/6 5/6 1 selected 18

Spatial Index n n Q searches for nodes intersecting structure based brush Q is spatial range query over spatial objects 1. 0 Brush A 0. 6 E E 0. 5 B F 0. 3 0. 2 C 0 B 1/6 J F G D 2/6 H 3/6 I 4/6 5/6 1 2 D Hierarchy Map n Spatial Index (R-Tree index) can help faster searches 20

Next n Caching and Prefetching 26
![User Trace Characteristics [Doshi: 2003] A Brush 0. 6 E 0. 5 B F User Trace Characteristics [Doshi: 2003] A Brush 0. 6 E 0. 5 B F](http://slidetodoc.com/presentation_image_h2/d4754f842c9fc717d7fd937ed3e1d2a2/image-20.jpg)
User Trace Characteristics [Doshi: 2003] A Brush 0. 6 E 0. 5 B F 0. 3 J 0. 2 C 0 G D 1/6 H I 1 2/6 Locality of exploration Contiguous queries have similar answers Presence of idle time Predictable of user movements (User Inertia) 3/6 4/6 5/6 Caching Prefetching 27

Cache Design n Purpose n n Minimize system latency Design Issues n n Cache Organization Cache Lookup Policy Cache Replacement Policy Computation of Remainder Queries 28

ØCache Organization ØCache Lookup Policy ØCache Replacement Policy ØComputation of Remainder Queries Cache Organization n n A E F G H Contiguous chunk of main memory that stores recently fetched nodes Each node has a descriptor n Horizontal and Vertical Extents (0, 1) A A E B E (0, 1) F F J C (0, 0) D G H I 2 D Hierarchy Map in database empty occupied G (1, 0) (0, 0) H 2 D Hierarchy Map of Cache Contents (1, 0) 29

ØCache Organization ØCache Lookup Policy ØCache Replacement Policy ØComputation of Remainder Queries Cache Lookup n Aim: n n Find nodes in cache that lie in current brush Cache Lookup n n Sequential scan, or Main memory spatial index Brush (0, 1) n A Main Memory Index n Advantage n n Faster cache look up F Disadvantage n empty occupied selected E Frequent index updates G (0, 0) H (1, 0) 30

ØCache Organization ØCache Lookup Policy ØCache Replacement Policy ØComputation of Remainder Queries Cache Replacement Policy n Aim: n n n Make room for new nodes Replace node with least probability of being referenced. Approach n Exploit general user trace characteristics Contiguous queries have similar answers Locality of Exploration Temporal Locality Spatial Locality LRU Distance 31

ØCache Organization ØCache Lookup Policy ØCache Replacement Policy ØComputation of Remainder Queries Distance Replacement Policy n Idea n n Replace object furthest away (2 D space) from current brush Realization : n n n Maintain brush store Select victim brush with max distance from current brush Replace individual cached nodes in victim brush Distance: Length of line segment that joins center of 2 brushes. 32

ØCache Organization ØCache Lookup Policy ØCache Replacement Policy ØComputation of Remainder Queries Distance Replacement Policy AI B E F G b 1 b 4 b 2 b 3 Brush Store Cache Contents Current Brush A (0, 1) Current Brush (0, 1) A b 1 b 2 E E b 3 B F J C D (0, 0) G H empty occupied selected b 4 G I Database Contents F B (1, 0) (0, 0) H Cache Contents I (1, 0) 33

ØCache Organization ØCache Lookup Policy ØCache Replacement Policy ØComputation of Remainder Queries n For each user request cache may contain: n n n All nodes requested A subset of nodes requested None of nodes requested (0, 1) Brush Remainder Brush A E F empty occupied selected G (0, 0) H Cache Contents (1, 0) 34

ØCache Organization ØCache Lookup Policy ØCache Replacement Policy ØComputation of Remainder Queries n n n Focus extents (e 1, e 2) of brush define interval Horizontal extents of cached nodes also form an interval Remainder query consists of a set of remainder brushes n Remainder brush: Part of brush interval not occupied by cache nodes (0, 1) A E F J empty occupied selected G (0, 0) e 1 Cache Contents Remainder Brush e 2 (1, 0) Current Brush 35
![Prefetcher [Doshi: 03] n Aim: n n n Predict and prefetch future user requests Prefetcher [Doshi: 03] n Aim: n n n Predict and prefetch future user requests](http://slidetodoc.com/presentation_image_h2/d4754f842c9fc717d7fd937ed3e1d2a2/image-29.jpg)
Prefetcher [Doshi: 03] n Aim: n n n Predict and prefetch future user requests into cache Increase hit ratio or minimize latency Motivation Presence of idle time Predictable user movements n Prefetching Working Model: Prediction Model Prefetcher User Log Cache Manager User Requests GUI Front End 36

Directional Prefetcher n Prediction Model n n Uses recent history of user requests Prefetches in direction of last user movement Direction Strategy e 2 t+1 Prefetch e 2 37

System Architecture Spatial Index Database Labeling Hierarchical Data Offline process ery qu Loader data que ry User GUI Request Answer Flat Data Backend Controller Request Answer Spatial Index LRU Seq. Scan Distance Cache Index Cache Memory Cache Lookup Cached Nodes Rep. Policy Cache Manager Front End Delta Calculator Delta query Prefetch Request Start/Stop Prefetch Controller Start/ Stop Direction Prefetcher 38

System Implementation n n Implemented as backend to Xmdv. Tool 6. 0 Language: C++ Database: Oracle with Oracle Spatial Extension Libraries: n Spatial Index Library (UC Riverside) n OTL (Oracle. . Template library) n ZThread 39

Evaluation n Goal: n n Effectiveness of Proposed Techniques in Isolation and in Combination Workloads: n Real Datasets n n n Input n n n D 1, out 5 d, size = 20, 000, dimensions =5 D 2, uvw, flow simulation data, size = 200, 000, dimensions = 6 A set of 4 , 1/2 hr. real user traces collected in [Doshi: 2003 apr] for dataset D 1. A set of 4, 1/2 hr. synthetic user traces for dataset D 2 User Trace n n Sequence of user requests. Each user request (position of SBB, time) 40

Evaluation Metrics n Latency for User Trace n Latency Reduction Ratio (lrr)Base Configuration • Li = Latency for request i. • Ti = Number of nodes in request i • No Index at the database 41

Experimental Results: Brief Summary n Spatial Index on the database used alone n n n Cache n n Distance replacement policy performs as well or better than LRU Increase in hit ratio 7% , Increase in lrr 2% for Data Set D 2 Main Memory Index n n lrr 58% for Data Set D 1 (Cache Size = 10%) lrr 94% for Data Set D 2 (Cache Size = 2%) Comparison of Replacement Policies n n lrr 33% for Data Set D 1 lrr 72% for Data Set D 2 We need spatial index structures that support high update rates. (e. g. LR-Tree [Bozanis: 2003]) Prefetcher and Cache n n lrr 63% for Data Set D 1 lrr 96% for Data Set D 2 47

Related Work n Visualization-database integrated systems n n Caching n n n ADR [Kurc: 2001] Tioga [Stonebaker: 1993] USD [Johnson: 1992] Semantic Caching [keller: 1996] or Predicate Caching [dar: 1996] Hierarchy Encoding n n n Nested Interval Method [Celko: 2004] Dietz’s numbering scheme [dietz: 1982] Dewey Order Encoding [tatxmlorder: 2002] 48

Conclusions n n n Hierarchy encoding technique n Maps tree structures to 2 dimensional spaces n Maps visual exploration operations to spatial range queries Designed cache to reduce response time n Replacement Policy: Distance or LRU n Cache Lookup: Sequential or Spatial Index Integrated direction-based prefetcher Implemented in free-ware XMDV Tool Conducted a performance study 49
![References [Doshi: 2003] P. Doshi et al. Prefetching for Visual Data Exploration [Doshi: 2003 References [Doshi: 2003] P. Doshi et al. Prefetching for Visual Data Exploration [Doshi: 2003](http://slidetodoc.com/presentation_image_h2/d4754f842c9fc717d7fd937ed3e1d2a2/image-38.jpg)
References [Doshi: 2003] P. Doshi et al. Prefetching for Visual Data Exploration [Doshi: 2003 apr] P. Doshi et al. A strategy selection framework for adaptive prefetching in data visualization [Bozanis: 2003] P. Bozanis et al. LR-Tree: a logarithmic decomposable spatial index method [Celko: 2004] J. Celko. Joe Celko’s Trees and Hierarchies in SQL for Smarties [Teuhola: 1996] J. Teuhola. Path signatures to speed up recursion in relational databases [Stonebaker: 1993] M. Stonebraker et al. Providing data management support for scientific visualization applications [dar: 1996] S. Dar et al. Semantic Data Caching and Replacement [keller: 1996] A. M. Keller et al. A predicated based caching scheme for client-server database architectures. [Kurc: 2001] T. Kurc et al. Exploration and visualization of large datasets with the active data repository [Johnson: 1992] M. Goldner et al. Usd- a database management system for scientific research [Fua: 1999] Y. H. Fua et al. Navigating hierarchies with structure-based brushes [dietz: 1982] P. F. Dietz, Maintaining order in a linked list [tatxmlorder: 2002] I. Tatarinov et al. Storing and Querying Ordered {XML} Using a Relational Database System [Stroe: 2000] I. Stroe. Scalable Visual Hierarchy Exploration 50
- Slides: 38