XML In An RDBMS World Michael D Thomas
XML In An RDBMS World Michael D. Thomas mdthomas@ibiblio. org
Data, data n n In general, computing is never far from data XML, RDBMSes deal with data – play in the same very large playground Extensive overlap between RDBMSes and XML Goal: Learn how to choose between RDBMS vs. XML, how to integrate the two technologies
First, some pragmatism… n n n “When all you have is a hammer, everything looks like a nail. ” What if all you have is a hammer? “When all you have is a hammer, everything must look like a nail. ” Best to have XML and RDBMS in your toolset Architectural purity is often impossible But we should know when & what we are compromising
Tonight’s Agenda n n Compare XML and SQL RDBMS Technologies Understand Their Sweet Spots Examine Their Use Through Studies of Anti-Patterns and Patterns Of Usage Discuss XQuery and XML Storage In The RDBMS
Topics n n n n n Defining The Playground – XML vs. RDBMS Anti-Pattern: XML As Transactional Data Stores & The ACID Test Anti-Pattern: Document Storage & The RDBMS XQuery & RDBMS Doc Storage Anti-Pattern: XML Declarative Languages & Gratuitous Encapsulation Of SQL Understanding Declarative Languages, Such As SQL Anti-Pattern: XML Meta-data & The Overly Abstract Database Managing The Split Between XML-based & RDBMSbased Data Models In The Same Application
XML As Database Anti-Pattern n n “I’ll use an XML file as a database” Advantages: Simple, Cheap An XML Schema defines the metadata of the database Data is right on the filesystem, not “hidden” in a complicated SQL RDBMS
XML As Database Problems n n Updates are hard Have to re-invent concurrency handling – multiple users changing data at the same time No constraint checking No optimization, such as indexes, caching, etc.
Organization of Data n n n Been formalizing the storage information for a long time Age of computing called for new approaches Relational model highly successful XML is relatively new Good for loosely structured data – documents -- and data transmission
Transmission & Interoperability n n Electronic transmission is newer than electronic storage With the growth of the Internet, transmission of data is exploding No Interoperability problems in 1950 – lots of interoperability problems now XML is an important standard for transmission and interoperability
Vocabulary n n n Datastore – Anything that stores data SQL Relational Database – A database that organizes info in tables and adheres to relational theory XML – 1) e. Xtensible Markup Language 2) The XML standard 3) All or part of an XML document
Vocabulary n n n XML Database – A database specialized for the storage of XML documents Object Database – A database that stores objects Object-Relational Database – A relational database with an extensible type system
Data Centric vs. Document Centric n n Data Centric approach – the datastore is focused on handling highly structured, fine grained data. Favors the relational model Document centric approach – the datastore is focused on handling semistructured data, such as web pages, books, etc. Favors XML
Data Centric vs. Document Centric n n Best queries are largely a result of structuring the data well (Messy desk vs. organized file cabinet) Documents are semi-structured data with ad hoc structures The overhead of defining rigorous structure for each type of document increases the overall cost of management (Would still give you the best queries)
Storage vs. Transmission n n Data Storage and Data Transmission are two different concepts XML is very strong for interoperable transmission Can store by writing XML to a file You can ‘transmit’ relational data by exporting a few tables and ftping, but isn’t a strong solution
Datastore Application Stack
XML/Relational Comparison Relational/SQL XML Metadata Definition Create Table… Data Persistence Insert … Update … Select … Define XML Schema (Optional) Create XML Document XPath, XQuery Data Query
Application With XML Transmissions
Three Data Models n n n Persistent Model – how data is stored Active Model (Object Model) – how data is arranged when it is being manipulated by a program, usually written in an imperative language Presentation/Transmission Model – how data is transmitted, usually as XML
Datastore Basics n Any Datastore must tackle the following issues: n n n Concurrency Transactions – the ACID test Locking Joins Normalization Administration Issues
Concurrency n n Datastores support concurrency if multiple users can access the same datastore at the same time Datastores must not allow the same data to be modified at the same time
Transactions n n Atomicity – No matter how complex, a transaction is atomic and indivisible. Transactions are “all or nothing. ” Consistency – Transactions must leave the database in a consistent state, i. e. , consistent with the rules Isolation – Transaction is isolated from other transactions Durability – The effects of a transaction persist
Isolation levels n n TRANSACTION_READ_UNCOMMITTED – Dirty reads, non-repeatable reads, phantom reads TRANSACTION_READ_COMMITTED – Non-repeatable reads, phantom reads TRANSACTION_REPEATABLE_READ – phantom reads TRANSACTION_SERIALIZABLE
Locking n n n Data must be locked for transactions to be isolated Locking is both a datastore and an application concern How much extraneous data is locked? (Page level locking, document level locking) Pessimistic locking: prevents reading of locked data Optimistic locking: generates an error when inappropriate data updates are attempted
Joins n n n A Join joins the data between two different data entities E. g. , SELECT * from emp, dept WHERE emp. deptno = deptno Joins are the cornerstone of SQL XPath doesn’t do joins between XML documents! XQuery, others can
Normalization n n Normalization – organizing data to minimize redundancy Normalized data is easier to maintain and easier to understand conceptually In relational db design, normalized DBs need to be denormalized for performance reasons XML docs can also be normalized, but not as much support for tying elements together Normalization is important when designing, less crucial at implementation
Administrative Issues n n Includes backup & recovery, installation, upgrades, optimization, maintenance, etc. In general: n n Bigger is better (economies of scale, 24 x 7 support) More popular, more standard is better (law of increasing returns) Whatever is already working in your organization is better (no need to hire, re-train administrators) Existing DB vendors have a huge advantage
Datastores n Different types: n n n Relational Database – highly structured data such as account balances, inventory quantities, etc. ) Document Database – used to store documents, probably in XML format The same DB can serve as both, e. g. , Oracle
Overview Of Relational Databases n n n The Relational Model SQL Entity Relationship Diagrams
Relational Model -- structure n n n Data is grouped in tables Tables have columns and rows Columns are fairly fixed – the set of columns shouldn’t change much (if at all) over the life of a table A table can have any number of rows Rows change constantly
Relational model – primary key n n In general, a table should have one or more columns defined as the primary key The primary key is unique and non-null Usually one column Can consist of more than one column (composite primary key)
Relational model – foreign key n n A foreign key describes a relationship between two tables The foreign key column of table. A points to a column in table. B Table. B is said to be the parent of table. A Often, the foreign key points at a primary key
SQL Three types: n Query SELECT * FROM emp WHERE deptno=10 ORDER BY ename n n Data Modification Language (DML): Update, Insert Data Definition Language (DDL): Create Table
Joins A Join is a Cartesian Cross-Product: SELECT count(*) FROM emp; SELECT count(*) FROM emp, dept; SELECT ename, emp. deptno, dname FROM emp, dept WHERE emp. deptno = deptno; n
Joins SELECT ename, dname FROM emp, dept WHERE emp. deptno = deptno
Entity Relationship Diagrams n n n Used to create a map of your data Describes tables, attributes of tables and relationships between tables Come at it from two directions: lay out the entities, assign the attributes; group the attributes into entities
XML vs. SQL Tables n n n XML – order matters! (Rows in a relational table are unordered) XML is a tree structure XML documents tend to be semi-structured, SQL tables are highly structured SQL tables aren’t as flexible or interchangable XML joins aren’t straightforward An XML document/element doesn’t serve multiple purposes as well as DB schema/table
Anti-Pattern: XML Document Storage As A SQL BLOB n n n Need to store XML in a database Better than storing in a filesystem Make a BLOB (Binary Large Object) and store a document as an element in a row of a table Problem: can’t query contents of document without extracting it from the db! For simple queries, might have to extract all of the documents into the application – very inefficient
Anti-Pattern: One-off SQL Schema For A Particular XML Document n n n BLOB storage is bad, so why not “shred” the document across multiple DB tables? I. e. , for XML elements named “Dept” make a “Dept” table, “Emp” elements make a “Emp” table, describe the hierarchy with foreign key constraints Query performance is much, much better – probably better than XPath against XML as a file
Problems n n A lot of work! Have to do this for every XML schema Hard! XML schemas are inherently more flexible than SQL schemas. Some mappings can be difficult Negates flexibility of XML Not as learnable – Programmers have to learn to query your SQL schema, not just XPath and XML Schema
Anti-Pattern: Developing A General “Shredding” Solution For All Of XML n n Your Application has several XML schemas, and it’s time-consuming to develop “shredding” for all of them So, you try to do a more generalized shredding Is possible, but is a very horizontal problem. You probably won’t get the time to solve it completely. Still might present learnability problems
XML Datastore Architectures n n n n XML Views Of Relational Data Relational Wrappers Of XML Independent Storage of XML Documents (Native XML Database) Text Storage Of XML In RDBMS XML Shredding Across Relational Tables Storing XML as Objects in Object- Relational DBs (Oracle XMLType) Everything is XML (XQuery approach)
Vocabulary: XML Collections n n XML Collection is a collection of XML documents “A row is to a table as an XML document is to an XML collection. ”
XML Derived From RDBMS n n n Data exists naturally as relational data Needs to be represented as an XML document for some reason The derived XML is usually used for transmission
Independent XML Store n n n A specialized database is used to store XML documents Typically, an application will either have two datastores – relational and XML – or relational data will be stored as XML In the dual datastore case, the application code has to join the different data sets
Text Storage Of XML In Databases n n n XML is stored in a column of a relational table Allows you to mix structured and semistructured approaches However, hard to query against the XML directly
XML Shredding Across Tables n n n An XML document is stored across many tables Is possible to use SQL queries against the document’s parts Vendors can implement an XPath-to. SQL translator Can structure the tables based on a schema A pain to handle yourself
Exercise n Shred an XML schema across relational tables
XMLType n n n Object-Relational databases allow you to define your own types You could define an Address object, define functions for the object, and store instances of the object in a column in the database Oracle defines an object-relational type, XMLType, for the storage of XML documents in the database SELECT e. po. Doc. get. Clobval() AS po. XML FROM po_xml_tab e WHERE e. po. Doc. exists. Node('/PO[PNAME = "po_2"]') = 1; Shredding is managed & encapsulated for you
Oracle 9 i: XMLType, Text
Everything Is XML (XQuery) n n n With XMLType, Oracle says that everything fits in to the Object. Relational realm XQuery says that everything can be represented and queried as XML Relational data is derived from RDBMS using SQL/XML
XQuery
SQL/XML select xmlelement("emp", 'Employee ' , xmlelement( "name", e. job || ' ' || e. ename), ' was hired on ', xmlelement("hiredate", e. hiredate)) as result from emp e; --------------------------------------- <emp>Employee <name>CLERK SMITH</name> was hired on <hiredate>17 -DEC-80</hiredate> </emp>
Query Soup n n SQL – queries relational data XPath – queries a particular XML document SQL/XML – a standard for deriving XML from relational data XQuery – queries a collection of XML documents and uses XPath to query particular XML documents
Query Soup n n n There are other XML query languages, such as XML-QL and Quilt XSLT is a transformation, not a query, language XSQL & MS XML/SQL are wrapper languages
Anti-Pattern: Encapsulating SQL With XML n n General Rule: “Encapsulation is good, but encapsulating good things is bad. ” SQL is good (Also, HTML is good) Why hide SQL with XML?
XML encapsulating SQL SELECT emp. ename, dept. dname FROM emp, dept WHERE emp. deptno = deptno <sql-query> <fields> <field> ename </field> <field> dname </field> </fields> <table> emp </table> <table> dept </table> </tables> <join>emp. deptno = deptno </join> </joins> </sql-query>
Same concept in XQuery. X “XQuery. X is an XML representation of an XQuery… The result is not particularly convenient for humans to read and write, but it is easy for programs to parse…”
XQuery Example XQuery: { for $b in doc("http: //www. bn. com/bib. xml")/ bib/book where $b/publisher = "Addison. Wesley" and $b/@year > 1991 return <book year = "{ $b/@year }"> { $b/title } </book> } n
XQuery. X Version <? xml version="1. 0" encoding="UTF-8"? > <? xml-stylesheet type="text/xsl" href="xqueryx. xsl"? > <xqx: module xmlns: xsi="http: //www. w 3. org/2001/XMLSchema-instance" xmlns: xqx="http: //www. w 3. org/2003/12/XQuery. X" xsi: schema. Location="http: //www. w 3. org/2003/12/XQuery. X xqueryx. xsd"> <xqx: main. Module> <xqx: query. Body> <xqx: expr xsi: type="xqx: element. Constructor"> <xqx: tag. Name>bib</xqx: tag. Name> <xqx: element. Content> <xqx: expr xsi: type="xqx: flwr. Expr"> <xqx: for. Clause. Item> <xqx: typed. Variable. Binding> <xqx: var. Name>b</xqx: var. Name> </xqx: typed. Variable. Binding> <xqx: for. Expr> <xqx: expr xsi: type="xqx: path. Expr"> <xqx: expr xsi: type="xqx: function. Call. Expr"> <xqx: function. Name>document</xqx: function. Name> <xqx: parameters> <xqx: expr xsi: type="xqx: string. Constant. Expr"> <xqx: value>bib. xml</xqx: value> </xqx: expr> </xqx: parameters> </xqx: expr> <xqx: step. Expr> <xqx: xpath. Axis>child</xqx: xpath. Axis> <xqx: element. Test> <xqx: node. Name> <xqx: QName>bib</xqx: QName> </xqx: node. Name> </xqx: element. Test> </xqx: step. Expr> <xqx: xpath. Axis>child</xqx: xpath. Axis> <xqx: element. Test> <xqx: node. Name> <xqx: QName>book</xqx: QName> </xqx: node. Name> </xqx: element. Test> </xqx: step. Expr> </xqx: expr> </xqx: for. Expr> </xqx: for. Clause. Item> </xqx: for. Clause> <xqx: where. Clause> <xqx: expr xsi: type="xqx: operator. Expr" xqx: infix="true"> <xqx: op. Type>AND</xqx: op. Type> <xqx: parameters> <xqx: expr xsi: type="xqx: operator. Expr" xqx: infix="true"> <xqx: op. Type>=</xqx: op. Type> <xqx: parameters> <xqx: expr xsi: type="xqx: path. Expr"> <xqx: expr xsi: type="xqx: variable"> <xqx: name>b</xqx: name> </xqx: expr> <xqx: step. Expr> <xqx: xpath. Axis>child</xqx: xpath. Axis> <xqx: element. Test> <xqx: node. Name> <xqx: QName>publisher</xqx: QName> </xqx: node. Name> </xqx: element. Test> </xqx: step. Expr> </xqx: expr> <xqx: expr xsi: type="xqx: string. Constant. Expr"> <xqx: value>Addison. Wesley</xqx: value>
XQuery. X Version (cont. ) <xqx: QName>publisher</xqx: QName> </xqx: node. Name> </xqx: element. Test> </xqx: step. Expr> </xqx: expr> <xqx: expr xsi: type="xqx: string. Constant. Expr"> <xqx: value>Addison. Wesley</xqx: value> </xqx: expr> </xqx: parameters> </xqx: expr> <xqx: expr xsi: type="xqx: operator. Expr" xqx: infix="true"> <xqx: op. Type>> </xqx: op. Type> <xqx: parameters> <xqx: expr xsi: type="xqx: path. Expr"> <xqx: expr xsi: type="xqx: variable"> <xqx: name>b</xqx: name> </xqx: expr> <xqx: step. Expr> <xqx: xpath. Axis>child</xqx: xpath. Axis> <xqx: attribute. Test> <xqx: node. Name> <xqx: QName>year</xqx: QName> </xqx: node. Name> </xqx: attribute. Test> </xqx: step. Expr> </xqx: expr> <xqx: expr xsi: type="xqx: integer. Constant. Expr"> <xqx: value>1991</xqx: value> </xqx: expr> </xqx: parameters> </xqx: expr> </xqx: where. Clause> <xqx: return. Clause> <xqx: expr xsi: type="xqx: element. Constructor"> <xqx: tag. Name>book</xqx: tag. Name> <xqx: attribute. List> <xqx: expr xsi: type="xqx: attribute. Constructor"> <xqx: attribute. Name>year</xqx: attribute. Name> <xqx: attribute. Value> <xqx: expr xsi: type="xqx: path. Expr"> <xqx: expr xsi: type="xqx: variable"> <xqx: name>b</xqx: name> </xqx: expr> <xqx: step. Expr> <xqx: xpath. Axis>child</xqx: xpath. Axis> <xqx: attribute. Test> <xqx: node. Name> <xqx: QName>year</xqx: QName> </xqx: node. Name> </xqx: attribute. Test> </xqx: step. Expr> </xqx: expr> </xqx: attribute. Value> </xqx: expr> </xqx: attribute. List> <xqx: element. Content> <xqx: expr xsi: type="xqx: path. Expr"> <xqx: expr xsi: type="xqx: variable"> <xqx: name>b</xqx: name> </xqx: expr> <xqx: step. Expr> <xqx: xpath. Axis>child</xqx: xpath. Axis> <xqx: element. Test> <xqx: node. Name> <xqx: QName>title</xqx: QName> </xqx: node. Name> </xqx: element. Test> </xqx: step. Expr> </xqx: expr> </xqx: element. Content> </xqx: expr> </xqx: return. Clause> </xqx: expr> </xqx: element. Content> </xqx: expr> </xqx: query. Body> </xqx: main. Module> </xqx: module>
When To Use Declarative XML
XML As User Interface n n Development teams often forget to write use cases for administrators, downstream developers XML declarative languages must be considered as ways to provide user interfaces Documentation, examples important Usability, Learnability issues important
Declarative XML In App. Lifecycle Compile-Time – Best Performance, Least Flexibility • Start-Time – XML Parsed Once, Good Performance, Good Flexibility. App must be restarted or manually refreshed to get flexibility • • Run-Time – Can Impact Performance, But Great Flexibility. Web Services Approach
Start-Time Configuration
Runtime Configuration
Anti-Pattern: The Overly Abstract Database n n n By using XML, you can achieve runtime configurability Instead of “hard coding” SQL table names, do the same job in XML configuration files Problem: database is so abstract that it is very hard or impossible to do SQL queries against it directly Database is closed – just a persistence extension to the application The application (and thus, the app dev team) assumes all responsibility for any work with the data
The Split Persistence Data Model n n n Applications Have Persistence Data Models Persistence Data Model – any data stored on disk Everything might be in the DB Everything might be in XML docs Usually, there is some split – i. e. , 95% in a SQL DB, but 5% in a properties file Managing the split between DB-based data and XML-based data is “accidental complexity” and can be a headache
Traditional Architecture
Typical Architecture
Joins between data models n n n Can use XQuery to join the two data models Could use XPath enhanced SQL to join the two models Often, the two models are joined at the object level
Databases Have Meta Data
Databases Have Metadata n n n Column names, table names & constraints are all metadata Defined at DB design time, which is usually app design time App is often hard coded against the database meta data
XML as DB Metadata
Pros vs. Cons n n n More flexible – XML can change at runtime More complex at the application level Database must be more abstract Can make the application harder to extend Don’t forget – a DB table can also serve the same purpose as an XML meta data doc
Alternative: Tables As Meta Data
- Slides: 75