EBERHARDKARLSUNIVERSITT TBINGEN SFS TCL XML Relational DB Proposal
EBERHARD-KARLS-UNIVERSITÄT TÜBINGEN SFS TCL XML → Relational DB Proposal by Richard Edwards Presented by Stephan Kepser
EBERHARD-KARLS-UNIVERSITÄT TÜBINGEN SFS TCL Richard Edwards Computer Scientist School of Informatics University of Bangor, Wales rich@informatics. bangor. ac. uk http: //www. informatics. bangor. ac. uk/~rich/ Source of this talk: 2 PPT presentations by Edwards. 10/29/2020 2
EBERHARD-KARLS-UNIVERSITÄT TÜBINGEN SFS TCL Aim To store arbitrary XML documents without recourse to underlying DTD or schema in a relational database in particular in a SQL database. 10/29/2020 3
EBERHARD-KARLS-UNIVERSITÄT TÜBINGEN SFS TCL In a Nutshell Store the Document Object Model of the XML document. Extend each node entry by two natural numbers resembling the position of the node in the document tree. 10/29/2020 4
EBERHARD-KARLS-UNIVERSITÄT TÜBINGEN SFS TCL Overview of Presentation XML and Databases Document object model (DOM) Storing the DOM in a relational database Adding navigational information Conclusions 10/29/2020 5
EBERHARD-KARLS-UNIVERSITÄT TÜBINGEN SFS TCL XML Extensible Markup Language Form of a class of documents may be given by a Document Type Definition (DTD) an XML Schema Model of a document: A tree. 10/29/2020 6
EBERHARD-KARLS-UNIVERSITÄT TÜBINGEN SFS TCL Example 10/29/2020 7
EBERHARD-KARLS-UNIVERSITÄT TÜBINGEN SFS TCL Terminology Well-formed Conforms to the “rules” of XML Valid Document structure conforms to a specified “schema” Structure = hierarchy of elements, attributes & content DTD (SGML), XML Schema… 10/29/2020 8
EBERHARD-KARLS-UNIVERSITÄT TÜBINGEN SFS TCL Categorising documents Data-centric Low-level atomic data Highly structured, e. g. a database extract Document-centric Semantic markup added to irregularly structured or largely text-based information These are loose categories Documents displaying the characteristics of both are common, e. g. notes added to structured data. 10/29/2020 9
EBERHARD-KARLS-UNIVERSITÄT TÜBINGEN SFS TCL XML versus Databases An XML document is not a database. Database management systems offer: Indexing (clustered and non-clustered) Constraints Foreign Keys Patterns Querying Multi-user access 10/29/2020 10
EBERHARD-KARLS-UNIVERSITÄT TÜBINGEN SFS TCL XML versus Database management systems offer: Security Transaction management Thread safety Crash recovery Cacheing Locking mechanisms Deadlock contention handling 10/29/2020 11
EBERHARD-KARLS-UNIVERSITÄT TÜBINGEN SFS TCL Document Object Model At the same time a model for arbitrary XML documents an application programming interface (API) for languages like Java, C++, Python Base: tree structure of a document 10/29/2020 12
EBERHARD-KARLS-UNIVERSITÄT TÜBINGEN SFS TCL DOM (2) A DOM is not the elements' tree of the document. Because: an XML document contains much more than elements: Attributes Comments Text nodes Notation Processing instructions … 10/29/2020 13
EBERHARD-KARLS-UNIVERSITÄT TÜBINGEN SFS TCL DOM root A DOM is a tree. Its root is the document node. This is a new root node to attach the whole document to. Its daughters: the top level objects of the document. These are the document's root element processing instructions <? xml-stylesheet href="person. css" type="text/css" ? > 10/29/2020 14
EBERHARD-KARLS-UNIVERSITÄT TÜBINGEN SFS TCL DOM root (2) document type node <!DOCTYPE person [ … ]> comment nodes <!-- no comment --> 10/29/2020 15
EBERHARD-KARLS-UNIVERSITÄT TÜBINGEN SFS TCL Node types Element node Attribute node Name and value of an attribute Document node Root of the DOM tree Document type node Root of the DTD Comment node 10/29/2020 16
EBERHARD-KARLS-UNIVERSITÄT TÜBINGEN SFS TCL Node types (2) Processing instruction node Text node Contains ordinary text of the document CDATA section node <![CDATA[ blabla ]]> Document fragment node Root of a scrap board for editing, Only for processing, not part of a real XMLdocument 10/29/2020 17
EBERHARD-KARLS-UNIVERSITÄT TÜBINGEN SFS TCL Node types (3) Entity node Definition of an entity in a DTD <!ENTITY coord "((x, y)|(y, x))"> Entity reference node Optional reference of an entity, e. g. , < Notation node Definition of a notation in a DTD <!NOTATION jpeg SYSTEM "image/jpeg"> 10/29/2020 18
EBERHARD-KARLS-UNIVERSITÄT TÜBINGEN SFS TCL Node properties Node name Element or attribute name, or a fixed string like #comment Node value Attribute value or the actual text of a text node Node type One of the node types above 10/29/2020 19
EBERHARD-KARLS-UNIVERSITÄT TÜBINGEN SFS TCL Node properties (2) Parent node List of child nodes First child Last child Previous sibling Next sibling Attributes set 10/29/2020 20
EBERHARD-KARLS-UNIVERSITÄT TÜBINGEN SFS TCL Leaf nodes Attribute node CDATA section node Comment node Entity reference node Notation node Processing instruction node Text node 10/29/2020 21
EBERHARD-KARLS-UNIVERSITÄT TÜBINGEN SFS TCL Example <sample bogus="value"> <text_node>Test data. </text_node> </sample> 10/29/2020 22
EBERHARD-KARLS-UNIVERSITÄT TÜBINGEN SFS TCL Example 10/29/2020 23
EBERHARD-KARLS-UNIVERSITÄT TÜBINGEN SFS TCL DOM The DOM contains a document's structure and ist content. It is a generic model, exists for any XMLdocument. It is independent of a DTD (may contain a DTD!) or a schema. 10/29/2020 24
EBERHARD-KARLS-UNIVERSITÄT TÜBINGEN SFS TCL Persistent DOM Store DOM in relation database Translate the tree into a set of nodes Store node properties in tables. 10/29/2020 25
EBERHARD-KARLS-UNIVERSITÄT TÜBINGEN SFS TCL Database design 10/29/2020 26
EBERHARD-KARLS-UNIVERSITÄT TÜBINGEN SFS TCL Node type Lookup table 10/29/2020 27
EBERHARD-KARLS-UNIVERSITÄT TÜBINGEN SFS TCL doc Data table 10/29/2020 28
EBERHARD-KARLS-UNIVERSITÄT TÜBINGEN SFS TCL doc Sample data 10/29/2020 29
EBERHARD-KARLS-UNIVERSITÄT TÜBINGEN SFS TCL node Data table * 10/29/2020 30
EBERHARD-KARLS-UNIVERSITÄT TÜBINGEN SFS TCL node Sample data 10/29/2020 31
EBERHARD-KARLS-UNIVERSITÄT TÜBINGEN SFS TCL element_name_leaf * * = my_incredibly_verbosely_named_element 10/29/2020 32
EBERHARD-KARLS-UNIVERSITÄT TÜBINGEN SFS TCL text_leaf 12 3 10/29/2020 33
EBERHARD-KARLS-UNIVERSITÄT TÜBINGEN SFS TCL element_namespace_leaf * * = my_incredibly_verbose_ns: myelement 10/29/2020 34
EBERHARD-KARLS-UNIVERSITÄT TÜBINGEN SFS TCL More tables attribute node_id attribute_namespace_leaf attribute_name_leaf attribute_value_leaf * 10/29/2020 35
EBERHARD-KARLS-UNIVERSITÄT TÜBINGEN SFS TCL Yet more tables comment_leaf cdata_leaf entity_reference_leaf pi_data_leaf pi_target_leaf 10/29/2020 36
EBERHARD-KARLS-UNIVERSITÄT TÜBINGEN SFS TCL Database design repeated 10/29/2020 37
EBERHARD-KARLS-UNIVERSITÄT TÜBINGEN SFS TCL Labelling nodes Each node receives two natural numbers as coordinates (x, y). The tree is traversed depth-first. A counter is installed. On entering and leaving a node it is incremented. On entering a node, the node receives the current counter value as x-coordinate. On leaving a node, the node receives the current counter value as y-coordinate. 10/29/2020 38
<a> <b> <c/> <d/> </b> <e/> <f> <g>…</g> </f> </a> x: 1 y: ? 1 doc ? SAX event: start. Document() DB action: insert doc (and return unique doc ID) 10/29/2020 39
<a> <b> <c/> <d/> </b> <e/> <f> <g>…</g> </f> </a> x: 2 y: ? 1 2 doc ? a ? SAX event: start. Element(ns. URI, loc. Name, q. Name, attrs) DB action: insert node, insert element_name_leaf 10/29/2020 40
<a> <b> <c/> <d/> </b> <e/> <f> <g>…</g> </f> </a> x: 3 y: ? 1 2 3 b doc ? a ? ? SAX event: start. Element(ns. URI, loc. Name, q. Name, attrs) DB action: insert node, insert element_name_leaf 10/29/2020 41
<a> <b> <c/> <d/> </b> <e/> <f> <g>…</g> </f> </a> x: 4 y: ? 1 2 b 3 4 c doc ? a ? ? ? SAX event: start. Element(ns. URI, loc. Name, q. Name, attrs) DB action: insert node, insert element_name_leaf 10/29/2020 42
<a> <b> <c/> <d/> </b> <e/> <f> <g>…</g> </f> </a> x: 4 y: 5 1 2 b 3 4 c doc ? a ? ? 5 SAX event: end. Element(ns. URI, loc. Name, q. Name) DB action: update node (set y index) 10/29/2020 43
<a> <b> <c/> <d/> </b> <e/> <f> <g>…</g> </f> </a> x: 6 y: ? 1 doc ? a 2 b 3 4 c 5 6 ? ? d ? SAX event: start. Element(ns. URI, loc. Name, q. Name, attrs) DB action: insert node, insert element_name_leaf 10/29/2020 44
<a> <b> <c/> <d/> </b> <e/> <f> <g>…</g> </f> </a> x: 6 y: 7 1 doc ? a 2 b 3 4 c 5 6 ? ? d 7 SAX event: end. Element(ns. URI, loc. Name, q. Name) DB action: update node (set y index) 10/29/2020 45
<a> <b> <c/> <d/> </b> <e/> <f> <g>…</g> </f> </a> x: 3 y: 8 1 doc ? a 2 b 3 4 c 5 6 ? 8 d 7 SAX event: end. Element(ns. URI, loc. Name, q. Name) DB action: update node (set y index) 10/29/2020 46
<a> <b> <c/> <d/> </b> <e/> <f> <g>…</g> </f> </a> x: 9 y: ? 1 b 3 4 c 5 6 8 d doc ? 2 a ? 9 e ? 7 SAX event: start. Element(ns. URI, loc. Name, q. Name, attrs) DB action: insert node, insert element_name_leaf 10/29/2020 47
<a> <b> <c/> <d/> </b> <e/> <f> <g>…</g> </f> </a> x: 9 y: 10 1 b 3 4 c 5 6 8 d doc ? 2 a ? 9 e 10 7 SAX event: end. Element(ns. URI, loc. Name, q. Name) DB action: update node (set y index) 10/29/2020 48
<a> <b> <c/> <d/> </b> <e/> <f> <g>…</g> </f> </a> x: 11 y: ? 1 b 3 4 c 5 6 8 d doc ? 2 a ? 9 e 10 11 f ? 7 SAX event: start. Element(ns. URI, loc. Name, q. Name, attrs) DB action: insert node, insert element_name_leaf 10/29/2020 49
<a> <b> <c/> <d/> </b> <e/> <f> <g>…</g> </f> </a> x: 12 y: ? 1 b 3 4 c 5 6 8 d doc ? 2 a ? 9 e 10 11 f ? 12 g ? 7 SAX event: start. Element(ns. URI, loc. Name, q. Name, attrs) DB action: insert node, insert element_name_leaf 10/29/2020 50
<a> <b> <c/> <d/> </b> <e/> <f> <g>…</g> </f> </a> x: 13 y: 14 1 doc ? 2 a ? 9 e 10 11 f ? 12 g ? SAX event: characters(char ch[], int start, int length) 13 DB action: insert node, insert text_leaf 10/29/2020 … 14 b 3 4 c 5 6 8 d 7 51
<a> <b> <c/> <d/> </b> <e/> <f> <g>…</g> </f> </a> x: 12 y: 15 1 b 3 4 c 5 6 8 d doc ? 2 a ? 9 e 10 11 f ? 12 g 15 13 … 14 7 SAX event: end. Element(ns. URI, loc. Name, q. Name) DB action: update node (set y index) 10/29/2020 52
<a> <b> <c/> <d/> </b> <e/> <f> <g>…</g> </f> </a> x: 11 y: 16 1 b 3 4 c 5 6 8 d doc ? 2 a ? 9 e 10 11 f 16 12 g 15 13 … 14 7 SAX event: end. Element(ns. URI, loc. Name, q. Name) DB action: update node (set y index) 10/29/2020 53
<a> <b> <c/> <d/> </b> <e/> <f> <g>…</g> </f> </a> x: 2 y: 17 1 b 3 4 c 5 6 8 d doc ? 2 a 17 9 e 10 11 f 16 12 g 15 13 … 14 7 SAX event: end. Element(ns. URI, loc. Name, q. Name) DB action: update node (set y index) 10/29/2020 54
<a> <b> <c/> <d/> </b> <e/> <f> <g>…</g> </f> </a> x: 1 y: 18 1 doc 18 2 a 17 9 e 10 11 f 16 12 g 15 SAX event: end. Document() 13 DB action: update doc (set success flag) & do extras 10/29/2020 … 14 b 3 4 c 5 6 8 d 7 55
<a> <b> <c/> <d/> </b> <e/> <f> <g>…</g> </f> </a> 10/29/2020 1 b 3 4 c 5 6 8 d doc 18 2 a 17 9 e 10 11 f 16 12 g 15 13 … 14 7 56
EBERHARD-KARLS-UNIVERSITÄT TÜBINGEN SFS TCL Navigation The next sibling (x’, y’) of any given node (x, y) has x’=y+1 The first child (x’, y’) of any given node (x, y) has x’=x+1 The set of nodes that originate from a given node (x’, y’) have x’<x<y’ 10/29/2020 57
EBERHARD-KARLS-UNIVERSITÄT TÜBINGEN SFS TCL Node labelling Facilitates rapid serialisation (serial output) No need to traverse the tree Facilitates simple locking mechanism But: Maintenance of node labels Trade-off between select speed and update efficiency 10/29/2020 58
EBERHARD-KARLS-UNIVERSITÄT TÜBINGEN SFS TCL Querying XPath expressions have to be translated into complicated select statements. Subcomponents of an XPath expression are translated into where-clauses or temporary tables (embeded select). Node coordinates help a lot, but do not do everything. 10/29/2020 59
EBERHARD-KARLS-UNIVERSITÄT TÜBINGEN SFS TCL Querying (2) Example: XPath: //book SQL: select * from node as nd, element_node_leaf as enl where nd. owner_doc_id = 2321 and nd. node_id = enl. node_id and enl. leaf_text = 'text_node'; 10/29/2020 60
EBERHARD-KARLS-UNIVERSITÄT TÜBINGEN SFS TCL Querying (3) XPath: //book/author SQL: select * from node as nd 1, node as nd 2, element_node_leaf as enl 1, element_node_leaf as enl 2 where nd 1. owner_doc_id = 2321 and nd 1. node_id = enl 1. node_id and enl 1. leaf_text = 'author' and nd 2. owner_doc_id = 2321 and nd 2. node_id = enl 2. node_id and enl 2. leaf_text = 'book' and nd 1. x_index = nd 2. x_index + 1; 10/29/2020 61
EBERHARD-KARLS-UNIVERSITÄT TÜBINGEN SFS TCL Querying (4) XSLT and XQuery require many selectstatements and additional computations outside of the database. XSLT and XQuery Turing-complete SQL II (Standard): First-order logic SQL III: FOL + least fixed points 10/29/2020 62
- Slides: 62