XML Tree Automata Sarai Sheinvald Automata Theory for
- Slides: 35
XML & Tree Automata Sarai Sheinvald Automata Theory for XML Researchers / Frank Neven [SIGMOD’ 02] Automata for XML – a Survey / Thomas Schwetnick [JCSS’ 07]
XML Introduction Tree automata Agenda Schemas XPath Solving problems With tree automata XPath queries XPath containment
XML Introduction Tree automata Agenda Schemas XPath Solving problems With tree automata XPath queries XPath containment
XML Extendible Markup Language for tagging the content of text files. <? XML version=“ 1. 0” encoding=“UTF-8”> <muppet creator=“Henson”> <name> Kermit </name> <animal> Frog </animal> <friends> <name> Ms. Piggy </name> </friends> </muppet>
XML file to tree - Example <muppet creator=“Henson”> <name> Kermit </name> <animal> Frog </animal> <friends> <name> Ms. Piggy </name> </friends> Muppet </muppet> creator name animal friends Henson Kermit Frog name Ms. Piggy
XML Introduction Tree automata Agenda Schemas XPath Solving problems With tree automata XPath queries XPath containment
Automata on Binary Trees Reminder: nondeterministic finite automaton (NFA) An accepting run: Σ = { a, b } Q = { q 0, q 1 } Ia = { q 0 } Ib = { q 1 } F = { q 0 } δ(q 0, a) = { q 0 } δ(q 0, b) = { q 1 } δ(q 1, a) = { q 0 } δ(q 1, b) = { q 1 } aaba � L(A) q 0 a q 1 We assume a different initial state for every letter b q 0 a
Automata on Binary Trees – Bottom up Σ = { 0, 1, � } Q = { q 0, q 1 } δ (q 0, � ) = q 0 δ (q 0, q 1, � ) = q 0 δ (q 0, q 1, � ) = q 1 I 0 = { q 0 } I 1 = { q 1 } F = { q 1 } δ (q 1, q 0, � ) = q 0 δ (q 1, � ) = q 1 δ (q 1, q 0, � ) = q 1 δ (q 1, � ) = q 1 An accepting run: δ: Q� Q � Σ � Q q 1 � � 0 q 1 � 1 1 1 q 0 q 1 q 1
Automata on Binary Trees – Top Down Σ = { 0, 1, � } Q = { q 0, q 1 } δ (q 0, � ) = {(q 1, q 0), (q 0, q 1), (q 0, q 0)} δ (q 1, � ) = {(q 1, q 1)} I 0 = { q 0 } I 1 = { q 1 } F = { q 1 } δ (q 0, � ) = {(q 0, q 0)} δ (q 1, � ) = {(q 1, q 0), (q 0, q 1), (q 1, q 1)} An accepting run: δ: Q � Σ � 2 Q �Q q 1 � � 0 q 1 � 1 1 1 q 0 q 1 q 1
Decision Problems and Complexities Membership: t� L(A) ? Nonemptiness: L(A) = �? Containment: L(A 1) �L(A 2) ? Det. Top-down LOGSPACE PTIME Nondet. Bottom-up LOGCFL PTIME EXPTIME Nondet Top-down LOGCFL PTIME EXPTIME Model Problem
XML trees have no boundary on the number of children… what do we do? Option 1: Unranked Trees to Binary Trees First-child nextsibling b b a a b encode b b a # # decode a a # b # a
Option 2: Automata on Unranked Trees In the binary case: δ: Q � Σ � 2 Q�Q We want to define δ for any number of children: �n � Q n=0 δ: Q � Σ � 2 We use regular languages to denote the transitions: δ: Q � Σ � 2 δ(q, a) is a regular language Q*
Automata on Unranked Trees - example Σ = { 0, 1, � , �} Q = { q 0, q 1 } F = { q 1 } δ (q 0, 0) = δ (q 1, 1) = ε δ (q 0, 1) = δ (q 1, 0) = � δ (q 0, � ) = (q 0+q 1)*q 0(q 0+q 1)* δ (q 1, � ) = q 1* δ (q 1, � ) = (q 0+q 1)*q 1(q 0+q 1)* δ (q 0, � ) = q 0* If δ(q, a) � ε , then q can An accepting run: be initial no need for I’s q 1 � � 1 0 q 1 � 0 1 q 0 q 1
XML Introduction Tree automata Agenda Schemas XPath Solving problems With tree automata XPath queries XPath containment
Schemas Musician name vita John Lennon born when 1940 married where when whom Liverpool 1962 Cynthia <!DOCTYPE Musicians [ Document Type Definition (DTD) Album died when whom when where 1969 Yoko 1980 NY title when imagine 1971 t � d <!ELEMENT Musicians (Musician*)> <!ELEMENT Musician (name, vita, album*)> <!ELEMENT vita (born, married*, died? )> <!ELEMENT married (when, whom)> <!ELEMENT died (when, where)> <!ELEMENT album (title, when)> ]>
XML Introduction Tree automata Agenda Schemas XPath Solving problems With tree automata XPath queries XPath containment
Decision problem for Schemas Validation: Schema Containment: Given: tree t , schema d Question: t � d? Given: schemas d 1, d 2 Question: t � d 1 t � d 2 for every t? Build a deterministic top-down automaton Ad such that L(Ad) = { t | t � d } δ (qa, a) = qb (qc)*qd � !ELEMENT a (b, c*, d) t � d � t � L(Ad) d 1 � d 2 � L(Ad 1) � L(Ad 2) In LOGSPACE In PTIME d 1 � d 2
XML Introduction Tree automata Agenda Schemas XPath Solving problems With tree automata XPath queries XPath containment
XPath The XML Path Language – A “navigation” language for selecting nodes from an XML document. Xpath formula
XPath - examples “/” : direct son /musician/vita/married Musician name vita John Lennon when 1940 born married where when whom Liverpool 1962 Cynthia Album married died when whom when where 1969 Yoko 1980 NY title when imagine 1971
XPath - examples “//” : offspring /musician//when Musician name vita John Lennon when 1940 born married where when whom Liverpool 1962 Cynthia Album married died when whom when where 1969 Yoko 1980 NY title when imagine 1971
XPath - examples “[ ]” : filter /a/b[/b]//a a b b b a a a
XPath - examples “*” : wild card /a/*//b a c b b a c a b b
XPath - examples “|” : union /a | //*/a a c b b a c a b b
XML Introduction Tree automata Agenda Schemas XPath Solving problems With tree automata XPath queries XPath containment
XPath Formula Tree Automaton Xpath formula p Ap such that L(Ap) = { trees that match p } Ap Interpreted as a regular language over Q = F = { /a//b[/b]//a } all strings that contain q Example: p = /a//b[/b]//a Σ = { a, b } δ ( /a//b[/b]//a , a ) = //b[/b]//a δ ( //b[/b]//a , b ) = //b[/b]//a V ( /b � //a ) V /b//a δ ( //b[/b]//a , a ) = //b[/b]//a δ ( /b//a , b ) = //a δ ( //a , a ) = true δ ( //a , b ) = //a δ ( /b , b ) = true all strings that contain both q and q’ all strings over Q
Returning the desired set: Two options for selecting a node: Add a select function s: Q� Σ {0, 1} For every run it is selected There exists a run that selects it s( //a , a ) = 1 s( q , � ) = 0 in all other cases a b /a//b[/b]//a a b //b[/b]//a /b//a b a //b[/b]//a a /b //a
XML Introduction Tree automata Agenda Schemas XPath Solving problems With tree automata XPath queries XPath containment
XPath Query Containment p � q Given: two queries p, q Question: Is p(t) � q(t) for all documents t? Examples: p = /a/b , q = /a//b p = /a[c]/b , q = /a/b Query Containment w. r. t. DTD: In the general case, These problems are undecidable Given: two queries p, q , DTD d Question: Is p(t) � q(t) for all documents t such that t �d? Examples: p � d q DTD = Musicians p = //vita/died/*, q = //when | //where
XPath Query Containment Boolean Containment: p � b q Given: two queries p, q Question: For every t, if p(t) � � then q(t) � � ? With [] operator, Boolean containment and containment can be solved with the same complexity a a’ a � p(t) a � q(t) ? a a’ # # a � p[#](t) a’ � q[#] (t)
XPath Query Containment Theorem: Boolean containment for XPath(/, //) with DTD is in PTIME Proof: Can only describe vertical paths in a tree p = p 1//p 2//…//pi … //pk pi = l 1/l 2/ …/lm Paths that match p can be described by a regular language p 1 Σ* p 2 Σ* … pk Σ* Deterministic top down automaton Ap that checks for a path that matches p (polynomial in p)
XPath Query Containment Theorem: Boolean containment for XPath(/, //) with DTD is in PTIME Proof: Deterministic top-down automaton Ap that checks for a path that matches p (polynomial in p) Deterministic top-down automaton Aq that checks that no path matches q • Deterministic top-down automaton Ad that checks that the tree is in the schema d p � L(Ap � Aq � Ad ) = � b q under d � Can be checked in polynomial time
XPath Query Containment Theorem: Boolean containment for XPath(/, //, [], *, | ) with DTD is in EXPTIME Proof: Deterministic bottom-up automaton Ap that checks for a path that matches p (exponential in p) Deterministic bottom-up automaton Aq that checks that no path matches q Containment is • Deterministic bottom-up automaton Ad also in EXPTIME that checks that the tree is in the schema d p � L(Ap � Aq � Ad ) = � b q under d � Exponential time
XML Introduction Tree automata Agenda Schemas XPath Solving problems With tree automata XPath queries XPath containment THANK YOU!
- Xml
- Sarai godsdienst
- Sarai olmos
- Sarita yardi
- Gtg in theory of automata
- Central concept of automata theory
- Bidirectional transducers in automata theory
- Csci 3130
- Formal languages and automata theory tutorial
- Cyk algo
- Automata theory
- Automata theory
- Is regular expression a language
- Valid and invalid alphabets
- Csci 3130
- Automata theory
- Formal languages and automata theory tutorial
- Automata theory
- Why we study automata theory
- Reverse of a string in automata theory
- Csci3130
- Kontinuitetshantering i praktiken
- Novell typiska drag
- Tack för att ni lyssnade bild
- Returpilarna
- Varför kallas perioden 1918-1939 för mellankrigstiden?
- En lathund för arbete med kontinuitetshantering
- Kassaregister ideell förening
- Tidböcker
- Sura för anatom
- Densitet vatten
- Datorkunskap för nybörjare
- Tack för att ni lyssnade bild
- Debattinlägg mall
- Delegerande ledarstil
- Nyckelkompetenser för livslångt lärande