The future of XML Storage Native XML Databases

The future of XML Storage: Native XML Databases VS. XML Enabled Databases By Felix Annan & Oleksiy Prokhorov

What is XML Stands for e. Xtensible Markup Language n It is a method of marking up documents n XML Documents come in two flavors n Data-Centric XML n Document-Centric XML n

Data-Centric XML n Characterized by well structured data <employee> <employee_data> <name>John Doe</name> <sex>M</sex> <address> 121 Cherry Tree Lane </address> <SSN>123 -45 -6789</SSN </employee_data> <name>Jane Doe</name> <sex>F</sex> <address> 121 Rain In. Spain Lane </address> <SSN>987 -65 -4321</SSN </employee_data> </employee>

Document-Centric XML n Characterized by Semi-structured data <Product> <Intro> The <Product. Name>Turkey Wrench</Product. Name> from <Developer>Full Fabrication Labs, Inc. </Developer> is <Summary>like a monkey wrench, but not as big. </Summary> </Intro> <Description> <Para>The turkey wrench, which comes in <i>both right- and left- handed versions (skyhook optional)</i>, is made of the <b>finest stainless steel</b>. The Readi-grip rubberized handle quickly adapts to your hands, even in the greasiest situations. Adjustment is possible through a variety of custom dials. </Para> <Para>You can: </Para> <List> <Item><Link URL="Order. html">Order your own turkey wrench</Link> </Item> <Item><Link URL="Wrenches. htm">Read more about wrenches</Link></Item> <Item><Link URL="Catalog. zip">Download the catalog</Link></Item> </List> <Para>The turkey wrench costs <b>just <price>$19. 99</price></b> and, if you order now, comes with a <b><addons>hand-crafted shrimp hammer</addons></b> as a bonus gift. </Para> </Description> </Product>

Types of XML Storage n XML can be stored in either: In an XML Enabled Database n In a Native XML Database n

XML Enabled Databases A system based on relational databases but with added functionality for storing XML n Usually have an XML Processing Layer that exists independently of the SQL system n

XML Enabled Databases Communication Entry Point XQuery Interface XML Schema Validation XML Processing Stack XQuery/SQL Interface Database Engine Relational Database Storage Legacy Relational Database

XML Enabled Databases n Data is stored in one of two ways In a Large Object (LOB) which is usually either a Character Large Object (CLOB) or a Binary Large Object (BLOB) n In a set of tables into which the XML document is shredded. n

XML Enabled Databases Communication Entry Point n XQuery Interface: n XQuery Interface XML Schema Validation XQuery/SQL Interface n XML Schema Validation This layer validates the XML document against a schema n This requires the document to be parsed. n SQL Interface Database Engine Relational Database Storage This interface accepts all XQueries and validates them for correctness.

XML Enabled Databases Communication Entry Point XQuery Interface XML Schema Validation XQuery/SQL Interface Database Engine Relational Database Storage n XML Schema Validation n Parsing is performed generally with either a Simple API for XML (SAX) parser or a Document Object Model (DOM) parser.

XML Enabled Database Communication Entry Point XQuery Interface XML Schema Validation XQuery/SQL Interface Database Engine Relational Database Storage n XQuery/SQL Interface: The XQuery is converted into a set of SQL statements, this can be done either automatically or manually with preset SQL statements for specific XQuery statements. n XML Shredding breaks down and stores an XML document into a set of relational tables by mapping the XML Schema to the tables n

XML Enabled Databases Communication Entry Point XQuery Interface XML Schema Validation n XML Publishing: n n XQuery/SQL Interface Database Engine Relational Database Storage n This is the process of generating an XML document from a database. For a shredded system this involves joining data from various tables and passing this up to the XML layer to be converted into an XML document. For a CLOB System this could involves document parsing when the required document is a subsection of another document

Document Shredding <? XML VERSION="1. 0" ENCODING="UTF-8"? > <CUSTOMER_NO> </CUSTOMER_NO>. . . <EMAIL> </EMAIL> <CUSTOMER_ADDRESS> <ADDRESS_TYPE> </ADDRESS_TYPE>. . . </CUSTOMER_ADDRESS> <INVOICE_NO> </INVOICE_NO>. . . <INVOICE_ITEM> <ITEM_NO> </ITEM_NO>. . . <DISCOUNT> </INVOICE_ITEM> </INVOICE> </CUSTOMER> <ITEM_NO> </ITEM_NO> <ITEM_NAME> </ITEM_NAME> <CATALOG_TYPE> </CATALOG_TYPE> <AUTHOR> </AUTHOR> <PUBLISHER> </PUBLISHER> <ITEM_PRICE> </ITEM>

Native XML Databases XML data is stored in a specialized structure built from the ground up to store XML. n The fundamental unit of storage is an XML document n

Native XML Database Communication Entry Point XQuery Interface XML Schema Validation XML Processing and Storage Database Engine Hierarchical XML Storage

Native XML Storage n Communication Entry Point XQuery Interface XML Schema Validation (Optional) Database Engine Hierarchical XML Storage: n This is a storage system where the XML data is stored as a tree structure with the root element being the topmost node and the descendants being the branches and leaves of the tree

NATIVE XML DATABASES CUSTOMER INVOICE_NO CUSTOMER_NO INVOICE_ITEM_NO ADDRESS_TYPE QUANTITY Hierarchical nature of an XML document CUSTOMER_ADDRESS

COMPARISON n Overall processing required for XML documents in XEDs is greater than in NXDs valuable time is spent in: Mapping XML Schemas to Relational Tables n Converting XQueries to SQL n Parsing and Re-parsing documents stored in LOBs n

COMPARISON n NXDs are able to store and export XML documents whilst keeping information about the document including: Comments n Processing Instructions n n NXDs also store the information in the way it was presented for storage thereby maintaining : Document order n Sibling Order n

COMPARISON NXDs, due to the storage structure can include less redundant data since only the data provided is stored. n XEDs must store a whole row no matter how many columns have their data provided n

COMPARISON Normalization in NXDs follows a more natural concept that makes it easier to follow. It is more concerned with the order of hierarchy of elements. n Normalization in XEDs requires rigorous experience in applying various normal forms to data n

COMPARISON Evolution in an XML Schema does not need to affect previously stored XML documents in an NXD n Evolution in an XML Schema affects every previously stored XML document in an XED. n n Brings up concerns about default values

COMPARISONS n Proposed improvements to the XED generally have side effects of either increased management issues or performance issues: n Adaptive Shredding n Denormalization of tables

Performance Tests All performance test results available have been for well structured or data-centric XML, not document-centric XML where NXDs are expected to be excel even more n Some test results didn’t reveal exactly which database vendor systems were involved in the test for legal reasons n

Test by Atakan Kurt & Mustafa Atay n Database Systems: n n n Oracle 9 i(XED) Tamino 2. 3. 1. 1 by Software AG (NXD) Data: n n 1, 2 and 3 million book records Database Structure: n n Book (bookid, year, title, publisher, price) Author (authored, parentid, last, first) Editor (editorid, parentid, first, last, affiliation) For the XML document all of these attributes formed one “document”

Test by Atakan Kurt & Mustafa Atay n Issues: The test was performed only with data-centric XML documents n The data used with Oracle was in a delimited file format not an XML document. n The queries run on Oracle were not XQueries but regular SQL queries n

Test by Atakan Kurt & Mustafa Atay T-Tamino O-Oracle UI-Un. Indexed I- Indexed

Test by Atakan Kurt & Mustafa Atay n n Join, Union and Intersection for Oracle had to be restricted to 2 tables in order to obtain results. The Oracle queries also had to be segmented to get results CPU Time and Logical Records for 3 -million records with Index

Tests by Joseph Fong, H. K. Wong, and Anthony Fong The databases used in the tests were not released for legal reasons n Test results were collected for 100; 1, 000; 10, 000 and 50, 000 sales records all in XML format. n Issues: n n Data was, again, data-centric.

Tests by Joseph Fong, H. K. Wong, and Anthony Fong The same results were obtained for single update and delete operations

Tests by Joseph Fong, H. K. Wong, and Anthony Fong

Tests by Joseph Fong, H. K. Wong, and Anthony Fong Similar results for mass deletes, mass updates and reporting

Summary of Results NXDs have better performance with large bulk operations involving searching, inserting, updating and deleting n The performance of NXDs on single operations requires improvements in order to fully dominate the XML document processing space n NXDs perform comparably to Traditional, non extended Relational Databases and thus with some optimization can replace them. n

THE INDUSTRY n We consider the implementations of XML support in three database systems: Microsoft SQL Server n Oracle n IBM DB 2 n

Microsoft SQL Server 2000 is a true XML Enabled Database utilizing Shredding. n SQL Server 2005 introduces an XML data type n Microsoft realized that the XML hierarchical model does not map unto the Relational tabular model. n This data type can be specified as the data type of a column. n Support for XQuery has been added n Users can query both the relational and XML n

Oracle 10 g Release 2 Oracle implements an XML data type for XML Storage n The XML data type is implemented as either a CLOB or shredded into a set of database objects n Oracle recommends the database objects for well structured frequently changing data. n Oracle recommends the CLOB for documentcentric XML which will mainly be updated as a whole n Oracle supports XQuerys n

IBM DB 2 V 9 supports an XML Data type. n Storage of the XML Data is in a separate storage independent of the Relational storage n Processes for dealing with XQuerys are different from processes for dealing with SQL n XQUERY INTERFACE SQL INTERFACE DATABASE ENGINE XML STORAGE RELATIONAL STORAGE

OBSERVATIONS n n n XML processing is no more performed as a layer over the relational engine thus there is a shift away from traditional XEDs All vendors implement a XML Data type that embodies some unique storage facilities for XML No vendor supplies XML as a separate package because most users that require XML also require some sort of relational storage. XML storage is not yet independent
- Slides: 38