XML Partial queries Query processing Query evaluation Query

  • Slides: 101
Download presentation

ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML Partial queries Query processing Query evaluation Query containment Experiments

ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML Partial queries Query processing Query evaluation Query containment Experiments Conclusion

Difficulties on Querying XML Data Creta Hotels Athens City Creta Island Athens Creta Location

Difficulties on Querying XML Data Creta Hotels Athens City Creta Island Athens Creta Location Island City Center Poros Chania Heraklio 3

Difficulties on Querying XML Data Search problem Name: Xiaoying Wu Place: Athens Center, Heraklio

Difficulties on Querying XML Data Search problem Name: Xiaoying Wu Place: Athens Center, Heraklio Purpose: Sightseeing Parthenon (438 BC) Problem: Creta structural difference Phaistos’ Disk (1700 BC) Hotels Athens City Creta Island Athens Creta Location Island City Center Poros Chania Heraklio 4

Difficulties on Querying XML Data Search problem Name: Theodore Dalamagas Place: Islands Purpose: Sea

Difficulties on Querying XML Data Search problem Name: Theodore Dalamagas Place: Islands Purpose: Sea sports Windsurf Problem: structural inconsistency Jet ski Creta Hotels City Athens Creta Island Athens Creta Location Island City Center Poros Chania Heraklio 5

Difficulties on Querying XML Data Search problem Name: Dimitri Theodoratos Place: Heraklio Purpose: HDMS

Difficulties on Querying XML Data Search problem Name: Dimitri Theodoratos Place: Heraklio Purpose: HDMS Conference Problem: HDMS 2008 unknown structure Creta Athens Hotels City Creta Island Athens Creta Location Island City Center Poros Chania Heraklio 6

Difficulties on Querying XML Data Search problem Name: Stefanos Souldatos Place: Any island Purpose:

Difficulties on Querying XML Data Search problem Name: Stefanos Souldatos Place: Any island Purpose: Escape from Ph. D! Problem: multiple sources Creta 1400 islands the. Hotel. gr hotels. gr holidays. gr 7

Difficulties on Querying XML Data Can we use existing query languages (XPath, XQuery) to

Difficulties on Querying XML Data Can we use existing query languages (XPath, XQuery) to express our queries? Can we use existing techniques to evaluate our queries? Creta Hotels Athens City Creta Island Athens Creta Location Island City Center Poros Chania Heraklio 8

Partial Queries in XPath 2 3 Hotels City Athens Hotels Path queries 1 0%

Partial Queries in XPath 2 3 Hotels City Athens Hotels Path queries 1 0% structure 100% 4 Tree-pattern queries 5 Hotels City Athens Island 1. //Hotels[descendant-or-self: : *[ancestor-or-self: : City][ancestor-or-self: : Athens]] 2. //Hotels[/City[descendant-or-self: : *[ancestor-or-self: : Athens]]] 3. //Hotels[/City//Athens] 4. //Hotels[/City[descendant-or-self: : *[ancestor-or-self: : Athens]]][//City [descendant-or-self: : *[ancestor-or-self: : Island]]] 9 5. //Hotels[/City//Athens][/City//Island]

Partial Queries r a c c b a d r root node (optional) a

Partial Queries r a c c b a d r root node (optional) a query node labelled by “a” child relationship descendant relationship 10

Conclusions (up to now) n n n Need for queries with partial structure We

Conclusions (up to now) n n n Need for queries with partial structure We introduce partial queries Partial queries can be expressed in XPath 11

ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML Partial queries Query processing Query evaluation Query containment Experiments

ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML Partial queries Query processing Query evaluation Query containment Experiments Conclusion

Query Processing r r a c c b a QUERY PROCESSING d partial path

Query Processing r r a c c b a QUERY PROCESSING d partial path query c a a b d QUERY EVALUATION partial path query in canonical form 13

Query Processing 1. 2. 3. 4. Full form Satisfiability Redundant nodes Canonical form r

Query Processing 1. 2. 3. 4. Full form Satisfiability Redundant nodes Canonical form r a c c b a d 14

Query Processing 1. 2. 3. 4. Full form Satisfiability Redundant nodes Canonical form IR

Query Processing 1. 2. 3. 4. Full form Satisfiability Redundant nodes Canonical form IR 1 r a c c b a d INFERENCE RULES (IR 1) |- r//ai (IR 2) x/y |- x//y (IR 3) x//y, y//z |- x//z (IR 4) x/ai, x//bj |- ai//bj (IR 5) ai/x, bj//x |- bj//ai (IR 6) x/y, y/w, x//z, z//w |- x/z (IR 7) x/y, x//z, w//y |- x/z (IR 8) x/y, y/w, x/z |- z/w (IR 9) x//y, y//w, x/z |- z//w (IR 10) x/y, w/z |- x/z (IR 11) x//y, w//z |- x//z (IR 12) x/y, y/w, z/w |- x/z (IR 13) x//y, y//w, z/w |- x//z x, y, z, w: query nodes ai/bj: nodes labelled by a/b 15

Query Processing 1. 2. 3. 4. Full form Satisfiability Redundant nodes Canonical form IR

Query Processing 1. 2. 3. 4. Full form Satisfiability Redundant nodes Canonical form IR 4 r a c c b a d INFERENCE RULES (IR 1) |- r//ai (IR 2) x/y |- x//y (IR 3) x//y, y//z |- x//z (IR 4) x/ai, x//bj |- ai//bj (IR 5) ai/x, bj//x |- bj//ai (IR 6) x/y, y/w, x//z, z//w |- x/z (IR 7) x/y, x//z, w//y |- x/z (IR 8) x/y, y/w, x/z |- z/w (IR 9) x//y, y//w, x/z |- z//w (IR 10) x/y, w/z |- x/z (IR 11) x//y, w//z |- x//z (IR 12) x/y, y/w, z/w |- x/z (IR 13) x//y, y//w, z/w |- x//z x, y, z, w: query nodes ai/bj: nodes labelled by a/b 16

Query Processing 1. 2. 3. 4. Full form Satisfiability Redundant nodes Canonical form r

Query Processing 1. 2. 3. 4. Full form Satisfiability Redundant nodes Canonical form r a c c b a d IR 4 INFERENCE RULES (IR 1) |- r//ai (IR 2) x/y |- x//y (IR 3) x//y, y//z |- x//z (IR 4) x/ai, x//bj |- ai//bj (IR 5) ai/x, bj//x |- bj//ai (IR 6) x/y, y/w, x//z, z//w |- x/z (IR 7) x/y, x//z, w//y |- x/z (IR 8) x/y, y/w, x/z |- z/w (IR 9) x//y, y//w, x/z |- z//w (IR 10) x/y, w/z |- x/z (IR 11) x//y, w//z |- x//z (IR 12) x/y, y/w, z/w |- x/z (IR 13) x//y, y//w, z/w |- x//z x, y, z, w: query nodes ai/bj: nodes labelled by a/b 17

Query Processing 1. 2. 3. 4. Full form Satisfiability Redundant nodes Canonical form r

Query Processing 1. 2. 3. 4. Full form Satisfiability Redundant nodes Canonical form r a IR 6 c c b a d IR 8 INFERENCE RULES (IR 1) |- r//ai (IR 2) x/y |- x//y (IR 3) x//y, y//z |- x//z (IR 4) x/ai, x//bj |- ai//bj (IR 5) ai/x, bj//x |- bj//ai (IR 6) x/y, y/w, x//z, z//w |- x/z (IR 7) x/y, x//z, w//y |- x/z (IR 8) x/y, y/w, x/z |- z/w (IR 9) x//y, y//w, x/z |- z//w (IR 10) x/y, w/z |- x/z (IR 11) x//y, w//z |- x//z (IR 12) x/y, y/w, z/w |- x/z (IR 13) x//y, y//w, z/w |- x//z x, y, z, w: query nodes ai/bj: nodes labelled by a/b 18

Query Processing 1. 2. 3. 4. Full form Satisfiability Redundant nodes Canonical form r

Query Processing 1. 2. 3. 4. Full form Satisfiability Redundant nodes Canonical form r c c a a b d INFERENCE RULES (IR 1) |- r//ai (IR 2) x/y |- x//y (IR 3) x//y, y//z |- x//z (IR 4) x/ai, x//bj |- ai//bj (IR 5) ai/x, bj//x |- bj//ai (IR 6) x/y, y/w, x//z, z//w |- x/z (IR 7) x/y, x//z, w//y |- x/z (IR 8) x/y, y/w, x/z |- z/w (IR 9) x//y, y//w, x/z |- z//w (IR 10) x/y, w/z |- x/z (IR 11) x//y, w//z |- x//z (IR 12) x/y, y/w, z/w |- x/z (IR 13) x//y, y//w, z/w |- x//z x, y, z, w: query nodes ai/bj: nodes labelled by a/b 19

Query Processing 1. 2. 3. 4. Full form Satisfiability Redundant nodes Canonical form r

Query Processing 1. 2. 3. 4. Full form Satisfiability Redundant nodes Canonical form r c c a a b d A query is unsatisfiable if its full form contains a trivial cycle: x y 20

Query Processing 1. 2. 3. 4. Full form Satisfiability Redundant nodes Canonical form A

Query Processing 1. 2. 3. 4. Full form Satisfiability Redundant nodes Canonical form A node y is redundant if one of the following patterns occur: a) x y r y c b a d x y c a c) b) y y y z 21

Query Processing 1. 2. 3. 4. Full form Satisfiability Redundant nodes Canonical form r

Query Processing 1. 2. 3. 4. Full form Satisfiability Redundant nodes Canonical form r canonical form of satisfiable query = full form – IR 2 – IR 3 – redundant nodes c a a b d 22

Canonical Form r partial path query directed acyclic graph with same-path constraint d b

Canonical Form r partial path query directed acyclic graph with same-path constraint d b c e r b c d e partial tree-pattern query directed acyclic graph with same-path constraints 23

Conclusions (up to now) n n Need for queries with partial structure We introduce

Conclusions (up to now) n n Need for queries with partial structure We introduce partial queries Partial queries can be expressed in XPath We can process any partial query dag 24

ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML Partial queries Query processing Query evaluation Query containment Experiments

ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML Partial queries Query processing Query evaluation Query containment Experiments Conclusion

Evaluation Algorithms Partial Path Queries PQGen: Produce path queries Path. Join: Decompose into paths

Evaluation Algorithms Partial Path Queries PQGen: Produce path queries Path. Join: Decompose into paths Partial. MJ: Dec. into spanning tree paths Partial. Path. Stack: novel holistic r d b c e r b c d e Partial Tree-Pattern Queries TPQGen: Produce TPQs PPJoin: Decompose into PPs Partial. Tree. Stack: novel holistic 26

Partial Path Queries: PQGen Producing all possible path queries… r r r b b

Partial Path Queries: PQGen Producing all possible path queries… r r r b b d d d b b e c e b e c c r d b c e 1. Produce all possible path queries 2. Evaluate paths using existing algorithms 3. Keep all results 27

Partial Path Queries: PQGen Producing all possible path queries… r r r b b

Partial Path Queries: PQGen Producing all possible path queries… r r r b b d d d b b e c e b e c c r d b c e 1. Produce all possible path queries 2. Evaluate paths using existing algorithms 3. Keep all results 28

Partial Path Queries: PQGen Producing all possible path queries… r r r b b

Partial Path Queries: PQGen Producing all possible path queries… r r r b b d d d b b e c e b e c c r d b c e 1. Produce all possible path queries 2. Evaluate paths using existing algorithms 3. Keep all results 29

Partial Path Queries: Path. Join Decomposing into root-to-leaf paths… r r d b c

Partial Path Queries: Path. Join Decomposing into root-to-leaf paths… r r d b c e r r d b c c d e 1. Decompose into root-to-leaf paths 2. Evaluate paths using existing algorithms 3. Join conditions (identity , path ) 30

Partial Path Queries: Path. Join Decomposing into root-to-leaf paths… r r d b c

Partial Path Queries: Path. Join Decomposing into root-to-leaf paths… r r d b c e r r d b c c d e 1. Decompose into root-to-leaf paths 2. Evaluate paths using existing algorithms 3. Join conditions (identity , path ) 31

Partial Path Queries: Path. Join Decomposing into root-to-leaf paths… r r d b c

Partial Path Queries: Path. Join Decomposing into root-to-leaf paths… r r d b c e r r d b c c d e 1. Decompose into root-to-leaf paths 2. Evaluate paths using existing algorithms 3. Join conditions (identity , path ) 32

Partial Path Queries: Partial. MJ Using a spanning tree… r r b d b

Partial Path Queries: Partial. MJ Using a spanning tree… r r b d b c e r d c e r d b c e 1. Create a spanning tree of the query 2. Decompose into root-to-leaf paths 3. Evaluate paths using an extension of Path. Stack 4. Join conditions (identity , structural , path ) 33

Partial Path Queries: Partial. MJ Using a spanning tree… r r b d b

Partial Path Queries: Partial. MJ Using a spanning tree… r r b d b c e r d c e r d b c e 1. Create a spanning tree of the query 2. Decompose into root-to-leaf paths 3. Evaluate paths using an extension of Path. Stack 4. Join conditions (identity , structural , path ) 34

Partial Path Queries: Partial. MJ Using a spanning tree… r r b d b

Partial Path Queries: Partial. MJ Using a spanning tree… r r b d b c e r d c e r d b c e 1. Create a spanning tree of the query 2. Decompose into root-to-leaf paths 3. Evaluate paths using an extension of Path. Stack 4. Join conditions (identity , structural , path ) 35

Partial Path Queries: Partial. MJ Using a spanning tree… r r b d b

Partial Path Queries: Partial. MJ Using a spanning tree… r r b d b c e r d c e r d b c e 1. Create a spanning tree of the query 2. Decompose into root-to-leaf paths 3. Evaluate paths using an extension of Path. Stack 4. Join conditions (identity , structural , path ) 36

Partial Path Queries: Partial. Path. Stack tree r r b b 1 d d

Partial Path Queries: Partial. Path. Stack tree r r b b 1 d d 1 c e Results: Sr Sb Sd Sc Se leaf nodes Partial. Path. Stack c 1 r e 1 d b d 2 c 2 leaf node Path. Stack e 2 c Results: e Sr Sb Sd Sc Se 37

Partial Path Queries: Partial. Path. Stack tree r Path. Stack r b 1 d

Partial Path Queries: Partial. Path. Stack tree r Path. Stack r b 1 d d 1 e Results: Sb Sd Sc Se leaf nodes Partial. Path. Stack c 1 r e 1 d b d 2 c Sr leaf node e 2 c Results: e r Sr Sb Sd Sc Se 38

Partial Path Queries: Partial. Path. Stack tree r Path. Stack r b 1 d

Partial Path Queries: Partial. Path. Stack tree r Path. Stack r b 1 d d 1 e Results: b 1 Sb Sd Sc Se leaf nodes Partial. Path. Stack c 1 r e 1 d b d 2 c Sr leaf node e 2 c Results: e r Sr b 1 Sb Sd Sc Se 39

Partial Path Queries: Partial. Path. Stack tree r Path. Stack r b 1 d

Partial Path Queries: Partial. Path. Stack tree r Path. Stack r b 1 d d 1 e Results: b 1 Sb d 1 Sd Sc Se leaf nodes Partial. Path. Stack c 1 r e 1 d b d 2 c Sr leaf node e 2 c Results: e r Sr b 1 Sb d 1 Sd Sc Se 40

Partial Path Queries: Partial. Path. Stack tree r Path. Stack r b 1 d

Partial Path Queries: Partial. Path. Stack tree r Path. Stack r b 1 d d 1 e Results: b 1 Sb d 1 Sd c 1 Sc Se leaf nodes Partial. Path. Stack c 1 r e 1 d b d 2 c Sr leaf node e 2 c Results: e r Sr b 1 Sb d 1 Sd c 1 Sc Se 41

Partial Path Queries: Partial. Path. Stack tree r Path. Stack r b 1 d

Partial Path Queries: Partial. Path. Stack tree r Path. Stack r b 1 d d 1 e b 1 Sb d 1 Sd Results: ra 1 b 1 d 1 c 1 e 1 c 1 e 1 Sc Se leaf nodes Partial. Path. Stack c 1 r e 1 d b d 2 c Sr leaf node e 2 c e r Sr Results: ra 1 b 1 d 1 c 1 e 1 b 1 Sb d 1 Sd c 1 Sc e 1 Se 42

Partial Path Queries: Partial. Path. Stack tree r b 1 r Path. Stack b

Partial Path Queries: Partial. Path. Stack tree r b 1 r Path. Stack b r d d 1 e b 1 Sb d 2 d 1 Sd Results: ra 1 b 1 d 1 c 1 e 1 r e 1 d b d 2 e 2 c e r Sr Results: ra 1 b 1 d 1 c 1 e 1 b 1 Sb c 1 Sc Se leaf nodes Partial. Path. Stack c 1 c 2 c Sr leaf node d 2 d 1 Sd c 1 Sc e 1 Se 43

Partial Path Queries: Partial. Path. Stack tree r b 1 r Path. Stack b

Partial Path Queries: Partial. Path. Stack tree r b 1 r Path. Stack b r d d 1 e b 1 Sb d 2 d 1 Sd Results: ra 1 b 1 d 1 c 1 e 1 r e 1 d b d 2 e 2 c e r Sr b 1 Sb c 2 c 1 Sc Se leaf nodes Partial. Path. Stack c 1 c 2 c Sr leaf node d 2 d 1 Sd Results: ra 1 b 1 d 1 c 1 e 1, ra 1 b 1 d 1 c 2 e 1 c 2 c 1 Sc e 1 Se 44

Partial Path Queries: Partial. Path. Stack tree r b 1 r Path. Stack b

Partial Path Queries: Partial. Path. Stack tree r b 1 r Path. Stack b r d d 1 e b 1 Sb d 2 d 1 Sd Results: ra 1 b 1 d 1 c 1 e 1, ra 1 b 1 d 1 c 1 e 2 Partial. Path. Stack c 1 r e 1 d b d 2 c Sr leaf node e 2 c e r Sr b 1 Sb d 2 d 1 Sd c 2 c 1 Sc e 2 Se leaf nodes c 2 c 1 Sc Results: ra 1 b 1 d 1 c 1 e 1, ra 1 b 1 d 1 c 2 e 1, ra 1 b 1 d 1 c 1 e 2 e 1 Se 45

Partial Path Queries: Partial. Path. Stack tree r r b b 1 d Path.

Partial Path Queries: Partial. Path. Stack tree r r b b 1 d Path. Stack [Bruno et al, 2002] Optimal for path queries: O(input + output) c e d 1 Partial. Path. Stack [Souldatos et al, 2007] c 1 r e 1 c 2 d b d 2 e 2 Optimal for partial path queries: O(input*indegree+output*outdegree) c e 46

Partial Path Queries: Comparison Problems: Algorithm: PQGen (path queries) Path. Join (dec. to paths)

Partial Path Queries: Comparison Problems: Algorithm: PQGen (path queries) Path. Join (dec. to paths) Partial. MJ (spanning tree) Partial. Path. Stack Many queries to evaluate Path Intermediate overlaps results 47

Evaluation Algorithms Partial Path Queries PQGen: Produce path queries Path. Join: Decompose into paths

Evaluation Algorithms Partial Path Queries PQGen: Produce path queries Path. Join: Decompose into paths Partial. MJ: Dec. into spanning tree paths Partial. Path. Stack: novel holistic r d b c e r b c d e Partial Tree-Pattern Queries TPQGen: Produce TPQs Partial. Path. Join: Decompose into PPs Partial. Tree. Stack: novel holistic 48

Partial Tree-Pattern Queries: TPQGen Producing all possible tree-pattern queries… r b c d e

Partial Tree-Pattern Queries: TPQGen Producing all possible tree-pattern queries… r b c d e c r r b d d b e e c 1. Produce all possible tree-pattern queries 2. Evaluate queries using existing algorithms 3. Keep all results 49

Partial Tree-Pattern Queries: TPQGen Producing all possible tree-pattern queries… r b c d e

Partial Tree-Pattern Queries: TPQGen Producing all possible tree-pattern queries… r b c d e c r r b d d b e e c 1. Produce all possible tree-pattern queries 2. Evaluate queries using existing algorithms 3. Keep all results 50

Partial Tree-Pattern Queries: TPQGen Producing all possible tree-pattern queries… r b c d e

Partial Tree-Pattern Queries: TPQGen Producing all possible tree-pattern queries… r b c d e c r r b d d b e e c 1. Produce all possible tree-pattern queries 2. Evaluate queries using existing algorithms 3. Keep all results 51

Partial Tree-Pattern Queries: Partial. Path. Join Decomposing into partial paths… r b c d

Partial Tree-Pattern Queries: Partial. Path. Join Decomposing into partial paths… r b c d e b r r d d c e 1. Decompose into partial paths 2. Evaluate partial paths using Partial. Path. Stack 3. Join conditions (identity ) 52

Partial Tree-Pattern Queries: Partial. Path. Join Decomposing into partial paths… r b c d

Partial Tree-Pattern Queries: Partial. Path. Join Decomposing into partial paths… r b c d e b r r d d c e 1. Decompose into partial paths 2. Evaluate partial paths using Partial. Path. Stack 3. Join conditions (identity ) 53

Partial Tree-Pattern Queries: Partial. Path. Join Decomposing into partial paths… r b c d

Partial Tree-Pattern Queries: Partial. Path. Join Decomposing into partial paths… r b c d e b r r d d c e 1. Decompose into partial paths 2. Evaluate partial paths using Partial. Path. Stack 3. Join conditions (identity ) 54

Partial Tree-Pattern Queries: Partial. Tree. Stack Twig. Stack tree r Partial. Tree. Stack r

Partial Tree-Pattern Queries: Partial. Tree. Stack Twig. Stack tree r Partial. Tree. Stack r b r Sr b 1 Sb d 1 Sr d c e Sc e 1 c Sb Sd c 1 b d e Sd Se Sc Se d 2 c 2 e 2 55

Partial Tree-Pattern Queries: Partial. Tree. Stack Twig. Stack tree r b r Sr r

Partial Tree-Pattern Queries: Partial. Tree. Stack Twig. Stack tree r b r Sr r b 1 Sb d 1 Partial. Tree. Stack r Sr d c e Sc e 1 b c Sb Sd c 1 r d e Sd Se Sc Se d 2 c 2 e 2 56

Partial Tree-Pattern Queries: Partial. Tree. Stack Twig. Stack tree r b 1 Sb d

Partial Tree-Pattern Queries: Partial. Tree. Stack Twig. Stack tree r b 1 Sb d 1 r Partial. Tree. Stack b r Sr d c e b 1 Sb Sd c 1 Sc e 1 r Sr r b c d e Sd Se Sc Se d 2 c 2 e 2 57

Partial Tree-Pattern Queries: Partial. Tree. Stack Twig. Stack tree r b 1 Sb d

Partial Tree-Pattern Queries: Partial. Tree. Stack Twig. Stack tree r b 1 Sb d 1 c 1 Sc e 1 r b r Sr d 1 Sd Partial. Tree. Stack d c e Se b 1 Sb Sc r Sr d 1 Sd r b c d e Se d 2 c 2 e 2 58

Partial Tree-Pattern Queries: Partial. Tree. Stack Twig. Stack tree r b 1 Sb d

Partial Tree-Pattern Queries: Partial. Tree. Stack Twig. Stack tree r b 1 Sb d 1 c 1 Sc c 1 e 1 rb 1 d 1 c 1 r b r Sr d 1 Sd Partial. Tree. Stack d c e Se b 1 Sb c 1 Sc r Sr d 1 Sd r b c d e Se rd 1 b 1 c 1 d 2 c 2 e 2 59

Partial Tree-Pattern Queries: Partial. Tree. Stack Twig. Stack tree r b 1 Sb d

Partial Tree-Pattern Queries: Partial. Tree. Stack Twig. Stack tree r b 1 Sb d 1 c 1 Sc e 1 rb 1 d 1 c 1 r b r Sr d 1 Sd Partial. Tree. Stack d c e e 1 Se rb 1 d 1 e 1 b 1 Sb c 1 Sc rd 1 b 1 c 1 r Sr d 1 Sd r b c d e e 1 Se rd 1 e 1 d 2 c 2 e 2 60

Partial Tree-Pattern Queries: Partial. Tree. Stack Twig. Stack tree r b 1 Sb d

Partial Tree-Pattern Queries: Partial. Tree. Stack Twig. Stack tree r b 1 Sb d 1 c 1 Sc e 1 rb 1 d 1 c 1 r b r Sr d 2 d 1 Sd Partial. Tree. Stack d c e Se rb 1 d 1 e 1 b 1 Sb c 1 Sc rd 1 b 1 c 1 r Sr d 2 d 1 Sd r b c d e e 1 Se rd 1 e 1 d 2 c 2 e 2 61

Partial Tree-Pattern Queries: Partial. Tree. Stack Twig. Stack tree r b 1 Sb d

Partial Tree-Pattern Queries: Partial. Tree. Stack Twig. Stack tree r b 1 Sb d 1 c 2 Sc c 1 e 1 rb 1 d 1 c 2 rb 1 d 2 c 2 d 2 c 2 e 2 r b r Sr d 2 d 1 Sd Partial. Tree. Stack d c e Se rb 1 d 1 e 1 b 1 Sb c 2 c 1 Sc r Sr d 2 d 1 Sd rd 1 b 1 c 1 rd 1 b 1 c 2 rd 2 b 1 c 2 r b c d e e 1 Se rd 1 e 1 62

Partial Tree-Pattern Queries: Partial. Tree. Stack Twig. Stack tree r b 1 Sb d

Partial Tree-Pattern Queries: Partial. Tree. Stack Twig. Stack tree r b 1 Sb d 1 c 1 Sc e 1 rb 1 d 1 c 2 rb 1 d 2 c 2 d 2 c 2 e 2 r b r Sr d 2 d 1 Sd Partial. Tree. Stack d c e e 2 Se rb 1 d 1 e 1 rb 1 d 1 e 2 rb 1 d 2 e 2 b 1 Sb c 2 c 1 Sc r Sr d 2 d 1 Sd rd 1 b 1 c 1 rd 1 b 1 c 2 rd 2 b 1 c 2 r b d c e e 2 e 1 Se rd 1 e 1 rd 1 e 2 rd 2 e 2 63

Partial Tree-Pattern Queries: Partial. Tree. Stack Twig. Stack tree r b 1 Sb d

Partial Tree-Pattern Queries: Partial. Tree. Stack Twig. Stack tree r b 1 Sb d 1 c 1 Sc e 1 rb 1 d 1 c 2 rb 1 d 2 c 2 d 2 c 2 e 2 r b r Sr d 2 d 1 Sd Partial. Tree. Stack d c e Se rb 1 d 1 e 1 rb 1 d 1 e 2 rb 1 d 2 e 2 b 1 Sb c 2 c 1 Sc r Sr d 2 d 1 Sd rd 1 b 1 c 1 rd 1 b 1 c 2 rd 2 b 1 c 2 r b d c e e 2 e 1 Se rd 1 e 1 rd 1 e 2 rd 2 e 2 rb 1 d 1 c 1 e 1, rb 1 d 1 c 1 e 2, rb 1 d 1 c 2 e 1, rb 1 d 1 c 2 e 2, rb 1 d 2 c 2 e 2 64

Partial Tree-Pattern Queries: Partial. Tree. Stack Twig. Stack tree r b b 1 d

Partial Tree-Pattern Queries: Partial. Tree. Stack Twig. Stack tree r b b 1 d d 1 c e Partial. Tree. Stack r b c d e c 1 O(input + output) O(input*|Q|*|PP|+output*N) e 1 Optimal for tree-pattern queries Optimal for “small” partial tree-pattern queries d 2 c 2 r e 2 |Q|=nodes+edges |PP|=No of PPs N=nodes 65

Partial Tree-Pattern Queries: Comparison Problems: Algorithm: TPQGen (TPQs) Partial. Path. Join (dec. to PPs)

Partial Tree-Pattern Queries: Comparison Problems: Algorithm: TPQGen (TPQs) Partial. Path. Join (dec. to PPs) Many queries to evaluate Path Intermediate overlaps results Partial. Tree. Stack 66

Conclusions (up to now) n n n Need for queries with partial structure We

Conclusions (up to now) n n n Need for queries with partial structure We introduce partial queries Partial queries can be expressed in XPath We can process any partial query dag We proposed algorithms for their evaluation 67

ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML Partial queries Query processing Query evaluation Query containment Experiments

ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML Partial queries Query processing Query evaluation Query containment Experiments Conclusion

Absolute Query Containment Each result of Q 1 is a result of Q 2.

Absolute Query Containment Each result of Q 1 is a result of Q 2. Q 1 Q 2 Q 1 Q 2 r a b r a b c c 69

Absolute Query Containment Each result of Q 1 is a result of Q 2.

Absolute Query Containment Each result of Q 1 is a result of Q 2. Q 1 Q 2 homomorphism from Q 2 to the full form of Q 1 Q 2 r a b r a b c c 70

Absolute Query Containment Each result of Q 1 is a result of Q 2.

Absolute Query Containment Each result of Q 1 is a result of Q 2. Q 1 Q 2 homomorphism from Q 2 to the full form of Q 1 Q 2 r a b r a b c c 71

Absolute Query Containment Each result of Q 1 is a result of Q 2.

Absolute Query Containment Each result of Q 1 is a result of Q 2. Q 1 Q 2 homomorphism from Q 2 to the full form of Q 1 Q 2 r a b c r a b c => Checking absolute query containment is very fast (homomorphism) 72

Relative Query Containment Some important stuff first: 1. Dimension graphs: summarize the structure of

Relative Query Containment Some important stuff first: 1. Dimension graphs: summarize the structure of an XML tree: XML Tree Dimension graph 73

Relative Query Containment Some important stuff first: 2. Dimension trees: equivalent to a query

Relative Query Containment Some important stuff first: 2. Dimension trees: equivalent to a query in a specific dimension graph DT 1. 1 Q 1 Dimension graph + = 74

Relative Query Containment Some important stuff first: 2. Dimension trees: equivalent to a query

Relative Query Containment Some important stuff first: 2. Dimension trees: equivalent to a query in a specific dimension graph DT 2. 2 DT 2. 1 Dimension graph Q 2 + = 75

Relative Query Containment Q 1 G Q 2 Each result of Q 1 in

Relative Query Containment Q 1 G Q 2 Each result of Q 1 in G is a result of Q 2 in G. Q 1 Q 2 Dimension graph G 76

Relative Query Containment Q 1 G Q 2 Each result of Q 1 in

Relative Query Containment Q 1 G Q 2 Each result of Q 1 in G is a result of Q 2 in G. homomorphism from the Dimension Trees of Q 2 to the Dimension Trees of Q 1 Q 2 Dimension graph G 77

Relative Query Containment Q 1 G Q 2 DT 1. 1 DT 2. 2

Relative Query Containment Q 1 G Q 2 DT 1. 1 DT 2. 2 DT 2. 1 Each result of Q 1 in G is a result of Q 2 in G. homomorphism from the Dimension Trees of Q 2 to the Dimension Trees of Q 1 G 78

Relative Query Containment Q 1 G Q 2 DT 1. 1 DT 2. 2

Relative Query Containment Q 1 G Q 2 DT 1. 1 DT 2. 2 DT 2. 1 G Each result of Q 1 in G is a result of Q 2 in G. homomorphism from the Dimension Trees of Q 2 to the Dimension Trees of Q 1 => Checking relative query containment can be very slow (#dimension trees) 79

Heuristic for Relative Cont. 1. Extract info from the dimension graph 2. Add it

Heuristic for Relative Cont. 1. Extract info from the dimension graph 2. Add it to Q 1 3. Check Q 1 Q 2 Dimension graph G 80

Heuristic for Relative Cont. 1. Extract info from the dimension graph : 2. Add

Heuristic for Relative Cont. 1. Extract info from the dimension graph : 2. Add it to Q 1 3. Check Q 1 Q 2 Dimension graph G 81

Heuristic for Relative Cont. 1. Extract info from the dimension graph : 2. Add

Heuristic for Relative Cont. 1. Extract info from the dimension graph : 2. Add it to Q 1 3. Check Q 1 Q 2 Dimension graph G 82

Heuristic for Relative Cont. 1. Extract info from the dimension graph : 2. Add

Heuristic for Relative Cont. 1. Extract info from the dimension graph : 2. Add it to Q 1 3. Check Q 1 Q 2 OK Dimension graph G 83

ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML Partial queries Query processing Query evaluation Query containment Experiments

ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML Partial queries Query processing Query evaluation Query containment Experiments Conclusion

Queries Used in the Experiments r a r b a c a b d

Queries Used in the Experiments r a r b a c a b d d b f c e e Q 1/Q 5 r f d e Q 2/Q 6 Q 3/Q 7 r b c a f Q 4/Q 8 d e 85

Query Evaluation Execution time on Treebank… 2. 5 million nodes 86

Query Evaluation Execution time on Treebank… 2. 5 million nodes 86

Query Evaluation Execution time on Treebank… 2. 5 million nodes path queries 87

Query Evaluation Execution time on Treebank… 2. 5 million nodes path queries 87

Query Evaluation Execution time on Treebank… 2. 5 million nodes too many results 88

Query Evaluation Execution time on Treebank… 2. 5 million nodes too many results 88

Query Evaluation Execution time on Synthetic data… 2. 5 million nodes (IBM Alpha. Works

Query Evaluation Execution time on Synthetic data… 2. 5 million nodes (IBM Alpha. Works XML generator) 89

Query Evaluation Execution time varying the size of the XML tree… Q 2 Partial.

Query Evaluation Execution time varying the size of the XML tree… Q 2 Partial. MJ Partial. Path. Stack Q 3 Partial. MJ Partial. Path. Stack Q 7 Partial. MJ Partial. Path. Stack 90

Query Containment Execution time varying the graph size… Relative Containment Time (sec) On-The-Fly Heuristic

Query Containment Execution time varying the graph size… Relative Containment Time (sec) On-The-Fly Heuristic Precomputed Heuristics Number of Graph Paths Heuristic accuracy > 98% > 90% > 78% > 60% 91

Query Containment Execution time varying the query size… Relative Containment Time (sec) On-The-Fly Heuristic

Query Containment Execution time varying the query size… Relative Containment Time (sec) On-The-Fly Heuristic Precomputed Heuristics Number of Nodes per Query Path Heuristic accuracy > 98% > 79% > 32% 92

Conclusions (up to now) n n n Need for queries with partial structure We

Conclusions (up to now) n n n Need for queries with partial structure We introduce partial queries Partial queries can be expressed in XPath We can process any partial query dag We proposed algorithms for their evaluation We showed that our algorithms for evaluation and containment outperform other techniques 93

ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML Partial queries Query processing Query evaluation Query containment Experiments

ΕΥΕΛΙΚΤΗ ΑΝΑΖΗΤΗΣΗ ΣΕ ΔΕΔΟΜΕΝΑ XML Partial queries Query processing Query evaluation Query containment Experiments Conclusion

Conclusions n n n Need for queries with partial structure We introduce partial queries

Conclusions n n n Need for queries with partial structure We introduce partial queries Partial queries can be expressed in XPath We can process any partial query dag We proposed algorithms for their evaluation We showed that our algorithms for evaluation and containment outperform other techniques 95

Contribution Evaluation Partial Path Queries Partial Tree-Pattern Queries CIKM ’ 07 WWW ’ 08

Contribution Evaluation Partial Path Queries Partial Tree-Pattern Queries CIKM ’ 07 WWW ’ 08 EDBT ’ 09? ? Containment SSDBM ’ 06 VLDB Journal ’ 08 Heuristics for Containment CIKM ’ 06 CIKM ’ 08 96

Publications n QUERY EVALUATION ¨ Stefanos Souldatos, Xiaoying Wu, Dimitri Theodoratos, Theodore Dalamagas, Timos

Publications n QUERY EVALUATION ¨ Stefanos Souldatos, Xiaoying Wu, Dimitri Theodoratos, Theodore Dalamagas, Timos Sellis. Evaluation of Partial Path Queries on XML Data. 16 th CIKM Conference, Lisboa, Portugal, 2007. ¨ Xiaoying Wu, Stefanos Souldatos, Dimitri Theodoratos, Theodore Dalamagas, Timos Sellis. Efficient Evaluation of Generalized Path Pattern Queries on XML Data. 17 th WWW Conference, Beijing, China, 2008. 97

Publications n QUERY CONTAINMENT ¨ Dimitri Theodoratos, Theodore Dalamagas, Pawel Placek, Stefanos Souldatos, Timos

Publications n QUERY CONTAINMENT ¨ Dimitri Theodoratos, Theodore Dalamagas, Pawel Placek, Stefanos Souldatos, Timos Sellis. Containment of Partially Specified Tree-Pattern Queries. 18 th SSDBM Conference, Vienna, Austria, 2006. ¨ Dimitri Theodoratos, Pawel Placek, Theodore Dalamagas, Stefanos Souldatos, Timos Sellis. Containment of Partially Specified Tree-Pattern Queries in the Presence of Dimension Graphs. VLDB Journal, 2008. 98

Publications n HEURISTICS FOR CONTAINMENT ¨ Dimitri Theodoratos, Stefanos Souldatos, Theodore Dalamagas, Pawel Placek,

Publications n HEURISTICS FOR CONTAINMENT ¨ Dimitri Theodoratos, Stefanos Souldatos, Theodore Dalamagas, Pawel Placek, Timos Sellis. Heuristic Containment Check of Partial Tree-Pattern Queries in the Presence of Index Graphs. 15 th CIKM Conference, Arlington, USA, 2006. ¨ Pawel Placek, Dimitri Theodoratos, Stefanos Souldatos, Theodore Dalamagas, Timos Sellis. Heuristic Approaches for Checking Containment of Generalized Tree-Pattern Queries. 17 th CIKM Conference, Napa Valley, California, USA, 2008. 99

Publications n WEB SEARCH PERSONALIZATION ¨ Stefanos Souldatos, Theodore Dalamagas, Timos Sellis. Sailing the

Publications n WEB SEARCH PERSONALIZATION ¨ Stefanos Souldatos, Theodore Dalamagas, Timos Sellis. Sailing the Web with Captain Nemo: a Personalized Metasearch Engine. Learning in Web Search Workshop, 22 nd ICML Conference, Bonn, Germany, 2005. ¨ Stefanos Souldatos, Theodore Dalamagas, Timos Sellis. Captain Nemo: A Metasearch Engine with Personalized Hierarchical Search Space. Informatica Journal, 2006. ¨ Stefanos Souldatos, Theodore Dalamagas, Timos Sellis. Sailing the Web with Captain Nemo: a Personalized Metasearch Engine. Internet Search Engines (book), ICFAI University (Institute of Chartered Financial Analysts of India). Reprint of the publication in Learning in Web Search Workshop, 2007. 100

Questions? Partial queries Query processing Query evaluation Query containment Experiments Conclusion

Questions? Partial queries Query processing Query evaluation Query containment Experiments Conclusion