Cooperative Query Answering for Semistructured Data By Michael
Cooperative Query Answering for Semistructured Data By Michael Barg and Raymond K. Wong Speakers: Chuan Lin & Xi Zhang
Outline l l l Motivations Overview Basic Concepts Cooperative Query Processing Experiment
Motivations l XML data – – same semantic content very different structures
Example: same semantics, diff structures Court Transcript: plaintiff User Query: woman “insurance claims” related to “smoking” for “woman” insurance claim Insurance Record: smoking insurance claim insurer woman smoking
Motivations l No exact query result User Query: “phone number” of “Bob” Who is the new “sales manager” Data: personnel sales manager Joe salesman assistant salesman sales manager Bob phone number
Overview l Goal: – – l Return approximate answers for XML queries “approximate”: semantic + structural similar Solution: – – Return a set of results ranked by an overall score: indicates how well the subgraph containing the result satisfies the query criteria.
Basic Concepts: Query: Query Tree /restaurant[. //Soho]/phone_number Query Tree: Result Term t soho h restaurant h t r phone_number For each edge: “head”: the end which is closer to nearest result term “end”: the other end In case of tie, “head” is the end closer to root
Basic Concepts: Converging Order l l Order of edges considered in query processing Converge on a result term
Basic Concepts: l Similarity Semantically similar topologies shopping_ center restaurant soho restaurant address eating_ places soho (a) (b) restaurant soho restaurant (c) (d) (e)
Basic Concepts: Similarity (cont. ) l Deviation Proximity (DP) – – Measure how far one structure deviates from a desired structure Given: l ra : l l – data node with value a rb: data node with value b Q(a, b): query tree edge DP: the actual position of rb to the nearest position, r’b, which satisfies the topological relationship specified by Q(a, b) l Topological relationship: parent-child, ancestor-descendent
Deviation Proximity Q (restaurant, soho) requires parent-child relationship shopping_ center restaurant soho (soho’) soho address (soho’) restaurant eating_ places soho (soho’) restaurant 1 (soho’) DP(restauarent, soho): 0 restaurant soho 2 3 3
Deviation Proximity Q (restaurant, soho) requires anc-desc relationship shopping_ center restaurant soho (soho’) soho address restaurant eating_ places soho (soho’) restaurant 0 (soho’) DP(restauarent, soho): 0 restaurant soho 2 3 3
Cooperative Query Processing l l l Input: a Query Tree QT, an XML Document Tree DT Output: ordered list of <rresult_term, score> Cooperative Query Processing – – Structural proximity calculation Progressive Score
Cooperative Query Processing (cont. ) l Progressively matching edges in QT with DT – – Consider edges in converging order For each edge QT(a, b), where a is head and b is tail, get a list of <ra, score> l ra l l is a node in DT with value a score is the progressive score of ra w. r. t the nearest rb use graph encoding to calculate structural proximity of ra and rb
Structural Proximity Calculation l Encodings and Compressed Arrays – – – l Compact Preserve relationship to a larger graph Facilitate distance calculations Proximity Searching
Encodings and Compressed Arrays l Basic Concepts: – – – l Path representation – – – l Common Node Terminal Node Annotated Node Representing Single Path Representing Multiple Paths Representing Multiple Elements Compressed Arrays – Each encoding is a path/muti-path for a node/a set of nodes
Encodings and Compressed Arrays
Representing Single Path 1. 1. 1 y 1 1. 2. 1. 1 y 2
Representing Multiple Paths 1. 3 B. B. 2. 1. 1 C. 3 C. C. 2 y 3
Representing Multiple Elements 1 A. A. 1. 1 y 1. 2. 1. 1 y 2. 3 B. B. 2. 1. 1 C. 3 C. C. 2 y 3
Compressed Arrays
Drawback of Encoding l 1 A. A. 1 B. B. 1 D. 2 E. ? . 2 C. C. 1 F. 2 G
Proximity Searching l Multi-Element Comparison – Input: l l – A compressed array, ca. N, containing the multi-element encoding of the Near Set. A compressed array, ca. F, containing the multi-path encoding or path encoding of all paths from the root to the specified element of the Find Set, EF. output: l dist, the shortest path from EF to the closest element in Near Set
Proximity Searching Min. Dist=5 Min. Dist = 4 Min. Dist = 2
Progressive Score l Accumulative Deviation Proximity (DP) – l Calculated from structural proximity Boolean operator at Query Tree branches a a b c prog(a) = prog(b)+prog(c) b c prog(a) = min (prog(b), prog(c))
Experiment Query: //restaurant/soho Query Result: <soho, 2> <soho, 3> <soho, 4> XML:
Thank you!
Questions & Answers
- Slides: 28