XML Tree Automata Sarai Sheinvald Automata Theory for

  • Slides: 35
Download presentation
XML & Tree Automata Sarai Sheinvald Automata Theory for XML Researchers / Frank Neven

XML & Tree Automata Sarai Sheinvald Automata Theory for XML Researchers / Frank Neven [SIGMOD’ 02] Automata for XML – a Survey / Thomas Schwetnick [JCSS’ 07]

XML Introduction Tree automata Agenda Schemas XPath Solving problems With tree automata XPath queries

XML Introduction Tree automata Agenda Schemas XPath Solving problems With tree automata XPath queries XPath containment

XML Introduction Tree automata Agenda Schemas XPath Solving problems With tree automata XPath queries

XML Introduction Tree automata Agenda Schemas XPath Solving problems With tree automata XPath queries XPath containment

XML Extendible Markup Language for tagging the content of text files. <? XML version=“

XML Extendible Markup Language for tagging the content of text files. <? XML version=“ 1. 0” encoding=“UTF-8”> <muppet creator=“Henson”> <name> Kermit </name> <animal> Frog </animal> <friends> <name> Ms. Piggy </name> </friends> </muppet>

XML file to tree - Example <muppet creator=“Henson”> <name> Kermit </name> <animal> Frog </animal>

XML file to tree - Example <muppet creator=“Henson”> <name> Kermit </name> <animal> Frog </animal> <friends> <name> Ms. Piggy </name> </friends> Muppet </muppet> creator name animal friends Henson Kermit Frog name Ms. Piggy

XML Introduction Tree automata Agenda Schemas XPath Solving problems With tree automata XPath queries

XML Introduction Tree automata Agenda Schemas XPath Solving problems With tree automata XPath queries XPath containment

Automata on Binary Trees Reminder: nondeterministic finite automaton (NFA) An accepting run: Σ =

Automata on Binary Trees Reminder: nondeterministic finite automaton (NFA) An accepting run: Σ = { a, b } Q = { q 0, q 1 } Ia = { q 0 } Ib = { q 1 } F = { q 0 } δ(q 0, a) = { q 0 } δ(q 0, b) = { q 1 } δ(q 1, a) = { q 0 } δ(q 1, b) = { q 1 } aaba � L(A) q 0 a q 1 We assume a different initial state for every letter b q 0 a

Automata on Binary Trees – Bottom up Σ = { 0, 1, � }

Automata on Binary Trees – Bottom up Σ = { 0, 1, � } Q = { q 0, q 1 } δ (q 0, � ) = q 0 δ (q 0, q 1, � ) = q 0 δ (q 0, q 1, � ) = q 1 I 0 = { q 0 } I 1 = { q 1 } F = { q 1 } δ (q 1, q 0, � ) = q 0 δ (q 1, � ) = q 1 δ (q 1, q 0, � ) = q 1 δ (q 1, � ) = q 1 An accepting run: δ: Q� Q � Σ � Q q 1 � � 0 q 1 � 1 1 1 q 0 q 1 q 1

Automata on Binary Trees – Top Down Σ = { 0, 1, � }

Automata on Binary Trees – Top Down Σ = { 0, 1, � } Q = { q 0, q 1 } δ (q 0, � ) = {(q 1, q 0), (q 0, q 1), (q 0, q 0)} δ (q 1, � ) = {(q 1, q 1)} I 0 = { q 0 } I 1 = { q 1 } F = { q 1 } δ (q 0, � ) = {(q 0, q 0)} δ (q 1, � ) = {(q 1, q 0), (q 0, q 1), (q 1, q 1)} An accepting run: δ: Q � Σ � 2 Q �Q q 1 � � 0 q 1 � 1 1 1 q 0 q 1 q 1

Decision Problems and Complexities Membership: t� L(A) ? Nonemptiness: L(A) = �? Containment: L(A

Decision Problems and Complexities Membership: t� L(A) ? Nonemptiness: L(A) = �? Containment: L(A 1) �L(A 2) ? Det. Top-down LOGSPACE PTIME Nondet. Bottom-up LOGCFL PTIME EXPTIME Nondet Top-down LOGCFL PTIME EXPTIME Model Problem

XML trees have no boundary on the number of children… what do we do?

XML trees have no boundary on the number of children… what do we do? Option 1: Unranked Trees to Binary Trees First-child nextsibling b b a a b encode b b a # # decode a a # b # a

Option 2: Automata on Unranked Trees In the binary case: δ: Q � Σ

Option 2: Automata on Unranked Trees In the binary case: δ: Q � Σ � 2 Q�Q We want to define δ for any number of children: �n � Q n=0 δ: Q � Σ � 2 We use regular languages to denote the transitions: δ: Q � Σ � 2 δ(q, a) is a regular language Q*

Automata on Unranked Trees - example Σ = { 0, 1, � , �}

Automata on Unranked Trees - example Σ = { 0, 1, � , �} Q = { q 0, q 1 } F = { q 1 } δ (q 0, 0) = δ (q 1, 1) = ε δ (q 0, 1) = δ (q 1, 0) = � δ (q 0, � ) = (q 0+q 1)*q 0(q 0+q 1)* δ (q 1, � ) = q 1* δ (q 1, � ) = (q 0+q 1)*q 1(q 0+q 1)* δ (q 0, � ) = q 0* If δ(q, a) � ε , then q can An accepting run: be initial no need for I’s q 1 � � 1 0 q 1 � 0 1 q 0 q 1

XML Introduction Tree automata Agenda Schemas XPath Solving problems With tree automata XPath queries

XML Introduction Tree automata Agenda Schemas XPath Solving problems With tree automata XPath queries XPath containment

Schemas Musician name vita John Lennon born when 1940 married where when whom Liverpool

Schemas Musician name vita John Lennon born when 1940 married where when whom Liverpool 1962 Cynthia <!DOCTYPE Musicians [ Document Type Definition (DTD) Album died when whom when where 1969 Yoko 1980 NY title when imagine 1971 t � d <!ELEMENT Musicians (Musician*)> <!ELEMENT Musician (name, vita, album*)> <!ELEMENT vita (born, married*, died? )> <!ELEMENT married (when, whom)> <!ELEMENT died (when, where)> <!ELEMENT album (title, when)> ]>

XML Introduction Tree automata Agenda Schemas XPath Solving problems With tree automata XPath queries

XML Introduction Tree automata Agenda Schemas XPath Solving problems With tree automata XPath queries XPath containment

Decision problem for Schemas Validation: Schema Containment: Given: tree t , schema d Question:

Decision problem for Schemas Validation: Schema Containment: Given: tree t , schema d Question: t � d? Given: schemas d 1, d 2 Question: t � d 1 t � d 2 for every t? Build a deterministic top-down automaton Ad such that L(Ad) = { t | t � d } δ (qa, a) = qb (qc)*qd � !ELEMENT a (b, c*, d) t � d � t � L(Ad) d 1 � d 2 � L(Ad 1) � L(Ad 2) In LOGSPACE In PTIME d 1 � d 2

XML Introduction Tree automata Agenda Schemas XPath Solving problems With tree automata XPath queries

XML Introduction Tree automata Agenda Schemas XPath Solving problems With tree automata XPath queries XPath containment

XPath The XML Path Language – A “navigation” language for selecting nodes from an

XPath The XML Path Language – A “navigation” language for selecting nodes from an XML document. Xpath formula

XPath - examples “/” : direct son /musician/vita/married Musician name vita John Lennon when

XPath - examples “/” : direct son /musician/vita/married Musician name vita John Lennon when 1940 born married where when whom Liverpool 1962 Cynthia Album married died when whom when where 1969 Yoko 1980 NY title when imagine 1971

XPath - examples “//” : offspring /musician//when Musician name vita John Lennon when 1940

XPath - examples “//” : offspring /musician//when Musician name vita John Lennon when 1940 born married where when whom Liverpool 1962 Cynthia Album married died when whom when where 1969 Yoko 1980 NY title when imagine 1971

XPath - examples “[ ]” : filter /a/b[/b]//a a b b b a a

XPath - examples “[ ]” : filter /a/b[/b]//a a b b b a a a

XPath - examples “*” : wild card /a/*//b a c b b a c

XPath - examples “*” : wild card /a/*//b a c b b a c a b b

XPath - examples “|” : union /a | //*/a a c b b a

XPath - examples “|” : union /a | //*/a a c b b a c a b b

XML Introduction Tree automata Agenda Schemas XPath Solving problems With tree automata XPath queries

XML Introduction Tree automata Agenda Schemas XPath Solving problems With tree automata XPath queries XPath containment

XPath Formula Tree Automaton Xpath formula p Ap such that L(Ap) = { trees

XPath Formula Tree Automaton Xpath formula p Ap such that L(Ap) = { trees that match p } Ap Interpreted as a regular language over Q = F = { /a//b[/b]//a } all strings that contain q Example: p = /a//b[/b]//a Σ = { a, b } δ ( /a//b[/b]//a , a ) = //b[/b]//a δ ( //b[/b]//a , b ) = //b[/b]//a V ( /b � //a ) V /b//a δ ( //b[/b]//a , a ) = //b[/b]//a δ ( /b//a , b ) = //a δ ( //a , a ) = true δ ( //a , b ) = //a δ ( /b , b ) = true all strings that contain both q and q’ all strings over Q

Returning the desired set: Two options for selecting a node: Add a select function

Returning the desired set: Two options for selecting a node: Add a select function s: Q� Σ {0, 1} For every run it is selected There exists a run that selects it s( //a , a ) = 1 s( q , � ) = 0 in all other cases a b /a//b[/b]//a a b //b[/b]//a /b//a b a //b[/b]//a a /b //a

XML Introduction Tree automata Agenda Schemas XPath Solving problems With tree automata XPath queries

XML Introduction Tree automata Agenda Schemas XPath Solving problems With tree automata XPath queries XPath containment

XPath Query Containment p � q Given: two queries p, q Question: Is p(t)

XPath Query Containment p � q Given: two queries p, q Question: Is p(t) � q(t) for all documents t? Examples: p = /a/b , q = /a//b p = /a[c]/b , q = /a/b Query Containment w. r. t. DTD: In the general case, These problems are undecidable Given: two queries p, q , DTD d Question: Is p(t) � q(t) for all documents t such that t �d? Examples: p � d q DTD = Musicians p = //vita/died/*, q = //when | //where

XPath Query Containment Boolean Containment: p � b q Given: two queries p, q

XPath Query Containment Boolean Containment: p � b q Given: two queries p, q Question: For every t, if p(t) � � then q(t) � � ? With [] operator, Boolean containment and containment can be solved with the same complexity a a’ a � p(t) a � q(t) ? a a’ # # a � p[#](t) a’ � q[#] (t)

XPath Query Containment Theorem: Boolean containment for XPath(/, //) with DTD is in PTIME

XPath Query Containment Theorem: Boolean containment for XPath(/, //) with DTD is in PTIME Proof: Can only describe vertical paths in a tree p = p 1//p 2//…//pi … //pk pi = l 1/l 2/ …/lm Paths that match p can be described by a regular language p 1 Σ* p 2 Σ* … pk Σ* Deterministic top down automaton Ap that checks for a path that matches p (polynomial in p)

XPath Query Containment Theorem: Boolean containment for XPath(/, //) with DTD is in PTIME

XPath Query Containment Theorem: Boolean containment for XPath(/, //) with DTD is in PTIME Proof: Deterministic top-down automaton Ap that checks for a path that matches p (polynomial in p) Deterministic top-down automaton Aq that checks that no path matches q • Deterministic top-down automaton Ad that checks that the tree is in the schema d p � L(Ap � Aq � Ad ) = � b q under d � Can be checked in polynomial time

XPath Query Containment Theorem: Boolean containment for XPath(/, //, [], *, | ) with

XPath Query Containment Theorem: Boolean containment for XPath(/, //, [], *, | ) with DTD is in EXPTIME Proof: Deterministic bottom-up automaton Ap that checks for a path that matches p (exponential in p) Deterministic bottom-up automaton Aq that checks that no path matches q Containment is • Deterministic bottom-up automaton Ad also in EXPTIME that checks that the tree is in the schema d p � L(Ap � Aq � Ad ) = � b q under d � Exponential time

XML Introduction Tree automata Agenda Schemas XPath Solving problems With tree automata XPath queries

XML Introduction Tree automata Agenda Schemas XPath Solving problems With tree automata XPath queries XPath containment THANK YOU!