Introduction XML an emerging standard for exchanging data

  • Slides: 20
Download presentation
Introduction • XML: an emerging standard for exchanging data on the WWW. • Relational

Introduction • XML: an emerging standard for exchanging data on the WWW. • Relational database: most wildly used DBMS. • Goal: how to map the relational data in the XML documents

Introduction (cont’d) • “Efficiently Publishing Relational Data as XML Documents” • Lei Jiang: a

Introduction (cont’d) • “Efficiently Publishing Relational Data as XML Documents” • Lei Jiang: a language for conversion, implementations • Yan Zhang: implementations • Yong Zhuge: performance comparison

Language • SQL with minor scalar and aggregate function extensions for XML construction •

Language • SQL with minor scalar and aggregate function extensions for XML construction • Advantage: use existing APIs and processing infrastructure of RDBMS • Other language proposals • Example

<customer id=“C 1”> <name>John Doe </name> <accounts> <account id=“A 1”>1894654</account> <account id=“A 2”>3849342</account> <porders>

<customer id=“C 1”> <name>John Doe </name> <accounts> <account id=“A 1”>1894654</account> <account id=“A 2”>3849342</account> <porders> <porder id=“PO 1” acct=“A 1”> //first purchase order <date>1 Jan 2000</date> <items> <item id=“I 1”>Shoes</item> <item id=“I 2”>Bungee Ropes</item> </items> <payment id=“P 1”>due January 15 </payment> <payment id=“P 2”>due January 20 </payment> <payment id=“P 3”>due February 15 </payment> </payments> </porder> <porder id= “PO 2” acct= “A 2”> //second purchase order … </porder> </porders> </customer>

Customer(id integer, name varchar(20) Account(id varchar(20), cus. Id integer, acctnum integer) Item(id integer, po.

Customer(id integer, name varchar(20) Account(id varchar(20), cus. Id integer, acctnum integer) Item(id integer, po. Id inteter, desc varchar(10) Purch. Order(id integer, cusid integer, acct. Id varchar(20) date varchar(10) Payment(id integer, po. Id integer, desc varchar(10)

Select cust. name, CUST(cust. id, cust. name, (Select XMLAGG(ACCT(acct. id, acctnum) From Account acct

Select cust. name, CUST(cust. id, cust. name, (Select XMLAGG(ACCT(acct. id, acctnum) From Account acct Where acct. cust. Id = cust. id), (Select XMLAGG(PORDER(porder. id, porder. acct, porder. date, (Select XMLAGG(ITEM(item. id, item. desc)) From Item item Where item. po. Id=porder. id), (Select XMLAGG(PAYMENT(pay. id, pay. desc)) From Payment pay Where pay. po. Id = porder. id))) From Purch. Order porder Where porder. cust. Id=cust. id)) From Customer cust

Define XML Constructor CUST (cust. Id: integer, cust. Name: varchar(20) acct. List: xml, porder.

Define XML Constructor CUST (cust. Id: integer, cust. Name: varchar(20) acct. List: xml, porder. List: xml) AS{ <customer id=$cust. Id> <name>$cust. Name</name> <accounts>$acct. List</accounts> <porders>$porder. List</porders> </customer> }

Implementation • Add tags and structure to the relational tables • Early Tagging, Early

Implementation • Add tags and structure to the relational tables • Early Tagging, Early Structuring • Late Tagging, Late Structuring • Early Tagging, Late Structuring • Outside Engine, Inside Engine

Early Tagging, Early Structuring • Outside engine: Stored Procedure Approach Simplest technique, commonly used

Early Tagging, Early Structuring • Outside engine: Stored Procedure Approach Simplest technique, commonly used Drawback: overhead of issuing many queries • Inside engine: Correlated CLOB, De-Correlated CLOB Approach

Late Tagging, Late Structuring • Content creation – Relational data is produced • Tagging

Late Tagging, Late Structuring • Content creation – Relational data is produced • Tagging and structuring – Relational data is structured and tagged to produce XML document

Content Creation • Redundant Relation Approach – Join every table together – Simple –

Content Creation • Redundant Relation Approach – Join every table together – Simple – Redundancy • Unsorted Outer Union Approach – Compute each path using join – One tuple per data item in the leaf level – Sub-expressions are shared to reduce redundancy

Content Creation (cont’) (Unsorted Outer Union Approach) Outer Union (Cust. Id, Cust. Info, POId,

Content Creation (cont’) (Unsorted Outer Union Approach) Outer Union (Cust. Id, Cust. Info, POId, POInfo, Item. Id, Item. Info) Right Outer Join (Cust. Id, Cust. Info, POId, POInfo, Payment. Id, Payment. Info) Left Outer Join Item Payment (Cust. Id, Cust. Info, Acct. Id, Acct. Info) (Cust. Id, Cust. Info, POId, POInfo) Right Outer Join Left Outer Join Account Customer Purchase. Order

Structuring & Tagging (Hash-based Tagger) • Two things need to do 1. Group all

Structuring & Tagging (Hash-based Tagger) • Two things need to do 1. Group all siblings in the desired XML document under the same parent • • In order to recognize siblings, we need to look for the same parent Using main-memory hash table to do this(given the parent’s type and id information) 2. Extract the information from each tuple and tag it to produce the XML result • • This will be done after all the input tuples have been hashed The output process is straightforward

Late Tagging, Early Structuring • Why? – Late tagging and Late structuring need complex

Late Tagging, Early Structuring • Why? – Late tagging and Late structuring need complex memory management • We can use “structured content” and “constant space tagger” to eliminate this problem • Structured content creation(Sorted Outer Union) – The key is to order the relational content the same way that it needs to appear in the result XML document – Two important factors need to be satisfied • Parent information occurs before, or with, child information • Information about a particular node and its descendants is not mixed in with information about non-descendant nodes.

Late Tagging, Early Structuring(cont’) – Performing a single final relational sort of the unstructured

Late Tagging, Early Structuring(cont’) – Performing a single final relational sort of the unstructured relational content is sufficient • Null value will be sorted first • Parents always are sorted before the children • Parent’s id occurs before child’s id, which ensure the children of a parent node are grouped together • Tagging Sorted Data – Easy • Tuples have been in order • Add tags and write out

Performance Comparison of Alternatives for publishing XML • The Parameters in our experiment 1)

Performance Comparison of Alternatives for publishing XML • The Parameters in our experiment 1) query fan out 2) query depth 3) Number of roots. 4) Number of leaf tuples ( Only balanced queries are considered in our experiment. )

Performance Comparison of Alternatives for publishing XML Parameter Settings for Experiment Parameter Range of

Performance Comparison of Alternatives for publishing XML Parameter Settings for Experiment Parameter Range of value Default Query Fan Out 2, 3, 4 2 Query Depth 2, 3, 4 2 # Roots 1, 500, 5000, 40000 5000 # Leaf Tuples 160000, 320000, 480000 320000

Performance Comparison of Alternatives for publishing XML

Performance Comparison of Alternatives for publishing XML

Performance Comparison of Alternatives for publishing XML

Performance Comparison of Alternatives for publishing XML

Summary and Conclusion This paper introduced, implemented and tested a mechanism for converting relational

Summary and Conclusion This paper introduced, implemented and tested a mechanism for converting relational data to XML Document. Different approaches are tested, include Stored Proc, CLOB-Corr, CLOBDe. Corr, Unsorted OU(In/Out), Sorted OU(In/Out). It points to the following conclusions, 1) Constructing an XML document inside the relational engine is far more efficient than doing so outside the engine, mainly because of the high cost of binding out tuples to host variables. 2) When processing can be done in main memory, a stable approach that is always among the very best (both inside and outside the engine), is the Unsorted Outer Union approach. 3) When processing cannot be done in main memory, the Sorted Outer Union approach is the approach of choice (both inside and outside the engine). This is because the relational sort operator scales well.