Deriving Relation Keys from XML Keys by Qing
Deriving Relation Keys from XML Keys by Qing Wang, Hongwei Wu, Jianchang Xiao, Aoying Zhou, Junmei Zhou Reviewed by Chris Ying Zhu, Cong Wang, Max Wang, Blazej Kot
Introduction • Motivation – XML data has been widely used on the internet for storing semi-structured data. • Aim of work – Mapping XML constraint keys to relation keys. • Results – A heuristic algorithm is introduced to achieve the aim.
Background • What a XML database or document look like <db> <book isbn="7 -111 -07526 -9"> <title>Networks</title> <author>Larry Peterson</author> <chapter number="1"> <name>Foundation</name> <section number="1"><name>TCPIP</name></section> <section number="2"><name>ATM</name></section> </chapter> <chapter number="9"> <name>Applications</name> </chapter> </book> <book isbn="7 -111 -06710 -X"> <title>Networks</title> <author>Shary Zhou</author> <chapter number="1"> <name>Introduction</name> </chapter> <chapter number="9"> <name>Applications</name> </chapter> </book> </db>
Background • XML Tree – DOM specification models XML as a tree
Background • XML Tree defined T=(V, lab, ele, att, val, id, root), where – V is a set of nodes; – lab maps each node to its type of label. One node can be • Element • Attribute • Text – – – ele defines element and text children of element node att maps element node to its attribute children val maps attribute node or text node to text id a unique identifier to each node root is the root of the tree
Background • Simple Expression PLs p: : = e| l. p – – – e an empty path. a concatenation operator Cannot contain wild card E. g. . book. author [[book. author]] set of node reach by book. author • Path Expression – – – PL P: : = e| l | P. P | _* l stands for any label or node name; _ wild card match one single node name or label _* matches any arbitrary path E. g. book. _* [[book. _*]] set of node reach by book. _*
Background • XML Keys – (Q’, (Q, {P 1, . . . , Pn})) – Q, Q’ are PL – Pi are PLs such that for all i ∈ {1, . . . , n} – Q’. Q. Pi is a valid path expression – Q’ context path, Q target path, – P 1, . . . , Pn key paths. – E. g. (book. _*, (chapter, {number}))
Background • Transformation language from XML to relational schema – transformation from XML data to a relational database of schema R is composed of a collection of rules from XML data to relations in the relational database – Rule Path • (1) X Y. P, – Y a rule path, P a path expression – Y. P is a valid path expression. • (2) xr, the root of XML tree. – A rule on a relation R is Rule(R) = {rule(R, l)| l ∈ att(R)}. – Given a relational schema R, a transformation from XML data to a relational database of schema R is = (Rule(R 1), . . . , Rule(Rn), where R 1, . . . , Rn are the relations in the relational database.
Background • Relation Tree RT = (V’, xr, xb, label, parent, children, leaf, attribute) – – – – V’ set of nodes Xr root of the relation tree Xb unique child of Xr label to a path expression, and label(Xr) = r. If xb, label maps it to P of PL. Otherwise, label maps to simple path expression PLs parent maps its parent node. For Xr, parent undefined children maps each interior node to a set of its children(Xr) = {Xb}. For leaves, children is undefined leaf returns true if v is a leaf. Otherwise, it returns false attribute maps each leaf v to a string l which denotes one of attributes of the relation R. For interior nodes, it is undefined • Any node in relational tree has corresponding node in the tree model
The Derivation Algorithm Simple Expression: Algorithm derivation( , Rule(R)) (Relation Keys ) The XML Key Set Rule(R) A rule in transformation schema • • • Construct The Relation Tree Process Each Node from RT in post-order Collect the key paths in a bottom up approach Compute the relation keys while it reach the root node. Return the relation keys
The Derivation Algorithm Example: Rule (book-chapter) (Diagram From “Driving Relation Keys From XML Keys”)
The Derivation Algorithm The Result Relation Key Set: cid isbn, cid title, cid author, cid cnumber, cid cname, (cnumber, isbn) title, (cnumber, isbn) author, (cnumber, isbn) cid, (cnumber, isbn) cname. These keys define the constraints of relation (book-chapter) in the relational datbase.
Summary xml data RDB Transformation xml keys RDB keys Transformation What this paper was about
Conclusion • Fixed relational schema -> the presented algorithm provides a way of testing if the incoming XML data is valid • Variable relational schema -> it may be updated by taking into account the relational keys generated by the algorithm from the incoming data and XML keys. • Future work: given some XML keys for data, automatically create a good transformation between the XML data and relational database.
- Slides: 14