Efficient Processing of XML Path Queries Using the





![A Running Example Q 1: /a/b Q 2: /a/b[d] Q 3: /a/b[c][d] extent {b, A Running Example Q 1: /a/b Q 2: /a/b[d] Q 3: /a/b[c][d] extent {b,](https://slidetodoc.com/presentation_image_h2/6b5adb9873393b10a3e471b6499007c3/image-6.jpg)






![Q. P. by Range. Fetch n H(1, c) = [3, 6] (chunk. ID, tag. Q. P. by Range. Fetch n H(1, c) = [3, 6] (chunk. ID, tag.](https://slidetodoc.com/presentation_image_h2/6b5adb9873393b10a3e471b6499007c3/image-13.jpg)
















- Slides: 29
Efficient Processing of XML Path Queries Using the Disk-based F&B Index Wei Wang University of New South Wales, Australia With Hongzhi Wang (HIT), Hongjun Lu (HKUST), Haifeng Jiang (IBM), Xuemin Lin (UNSW), VLDB Jianzhong Li (HIT) 2005
XML Query Processing n XML ¨ Modeled as a labeled tree n Query by structural constraint ¨ Simple Path Queries, e. g. , //Customer//Name ¨ Branching/Twig Queries, e. g. , //Customer[//Zipcode]//Name 1/12/2022 VLDB 2005 2
Q 1: /a/b Index or Join? n Index-based approaches a ¨ Data. Guide, 1 -index ¨ F&B Index b b ¨ and a few approximate indexes n Join-based approaches a a a ¨ Structural join ¨ Twig join b b b Join-based approaches appear to be more actively researched! 1/12/2022 VLDB 2005 3
Outline Introduction n Disk-based F&B Index n Experiment n Conclusions n 1/12/2022 VLDB 2005 4
XML Structural Indexes n “Exact” Indexes ¨ 1 -index Based on backward bisimilarity n Covers all simple path queries n ¨ F&B Index Based on backward and forward bisimilarity n Covers all branching queries (optimally) n 1/12/2022 VLDB 2005 5
A Running Example Q 1: /a/b Q 2: /a/b[d] Q 3: /a/b[c][d] extent {b, b, b} 1/12/2022 VLDB 2005 6
Problems with F&B Index? n Lack of scalability ¨ Usually large in practice ¨ No immediate solution when it cannot be accommodated in memory n n n Unbalanced, all-leaf-nodes tree Naïve solutions (e. g. , B+-tree, pre-order clustering in Lore, subtree clustering in Natix) do not work well Lack of efficiency ¨ Non-deterministic searching ¨ //-axis requires traversing the whole subtrees ¨ Much more costly when the index is not in the memory 1/12/2022 VLDB 2005 7
Outline Introduction n Disk-based F&B Index n Experiment n Conclusions n 1/12/2022 VLDB 2005 8
Disk-based F&B Index n n Overcome the memory limit by putting F&B index to the disk Naïve method does not work well Q 1: /a/b 1/12/2022 VLDB 2005 9
Basic Idea n Moral: Clustering is important Cluster by tag tape 2. Cluster by parent segment & block 3. Cluster by 1 -index ID chunk ¨ Benefits: 1. n n 1/12/2022 Optimized tree traversals Enable other intelligent algorithms VLDB 2005 10
Q 1: /a/b 1/12/2022 VLDB 2005 11
Q. P. by Tree Traversal n n n Dim 1: DFS/BFS Dim 2: Path/Branching Path Dim 3: / or // Q 5: /a/b/c Q 2: /a/b[d] Q 4: /a//c Problem: Still have to traverse the entire subtrees to process // 1/12/2022 VLDB 2005 12
Q. P. by Range. Fetch n H(1, c) = [3, 6] (chunk. ID, tag. Name) Q 4: /a//c Restriction: Can only answer /p//q, where p is a simple path. 1/12/2022 VLDB 2005 13
More Data Structures n 3 more tapes: ¨ Add region code for each d-node in the extents Extents Tape Use physical (start, end) codes n Sort d-nodes according to (start, end) n ¨ Add Doc Tape ¨ Add Value Tape 1/12/2022 VLDB 2005 14
Example 1/12/2022 VLDB 2005 15
Seg. SJ n Key observation: ¨ Structural relationship between two segments can be inferred from the relationship between their first d-nodes in their extent. n b 1 (10, 78), (210, 297), … d 1 (19, 25), (54, 66), … Seg. SJ(/p//q) ¨ R(s, e) A = /p Take the (s, e) of the first ¨ S(s, e) D = //q d-node in each segment ¨ Structural join R and S n 1/12/2022 Using partition-based or sortingbased SJ algorithm VLDB 2005 16
Outline Introduction n Disk-based F&B Index n Experiment n Conclusions n 1/12/2022 VLDB 2005 17
Experiments n Setup ¨ DBLP/XMark/Tree. Bank ¨ 8 representative queries n Dim 1: PC/AD n Dim 2: Path/Twig n Dim 3: Large/Small ¨ DFS, BFS, Range. Fetch, Seg. SJ ¨ No. K, Twig. Stack, Kaushik’s algorithm in [SIGMOD 04] ¨ Metric: time/PIO/LIO 1/12/2022 VLDB 2005 18
Varying Buffer Size (PC-Path) 1/12/2022 VLDB 2005 19
Varying Buffer Size (PC-Twig) 1/12/2022 VLDB 2005 20
Varying Buffer Size (AD-Path) 1/12/2022 VLDB 2005 21
Varying Buffer Size (AD-Twig) 1/12/2022 VLDB 2005 22
Buffer Hit Ratio 1/12/2022 VLDB 2005 23
Scalability 1/12/2022 VLDB 2005 24
Comparing with Other Systems 1/12/2022 VLDB 2005 25
Outline Introduction n Disk-based F&B Index n Experiment n Conclusions n 1/12/2022 VLDB 2005 26
Conclusions n Disk-based F&B Index ¨ Store and cluster the index on the disk ¨ More efficient and intelligent query processing algorithms n n Demonstrated good scalability and query efficiency Expecting new query processing algorithms based on index probing (in addition to joinbased approaches) 1/12/2022 VLDB 2005 27
Q&A Thank You! 1/12/2022 VLDB 2005 28
Related Work n Indexes ¨ Exact: Data. Guide, 1 -index, F&B Index ¨ Approx: Approx. Data. Guide, A(k)-index, D(k)-index, M*(k)-index n n n Join-based approaches Hybrid approach: “mixed-mode” in [VLDB 03] Niagara ¨ [VLDB 03] combines tree traversals + joins ¨ [SIGMOD 04] use 1 -index to accelerate joins n Clustering ¨ Lore: pre-order ¨ Natix: subtree 1/12/2022 VLDB 2005 29