CS 315 B Domainspecific Languages for Parallelism XML
CS 315 B Domain-specific Languages for Parallelism XML DATABASES Xquery / XQuery. P Charis Charitsis & Nicolas Kokkalis
XML: What and Why? Application: Web Services XQuery: The XML Query Language Good News: XQuery, as a declarative language, is ideal for automatic parallel execution Bad News: We still need Java We limmit the “automatic parallel execution” XQuery. P scripting extension and the tradeoff CS 315 B
<language type=”computer language”> <name> XML </name> <description> A universal format for structured documents and data. </description> </language> XML is designed to describe data and to focus on what data is. q HTML is designed to display data and to focus on how data looks. Ø Thus, HTML is about displaying information, while XML is about describing information. q CS 315 B – Domain-specific Languages for Parallelism
q Can represent a wide variety of both structured and unstructured data q Can be used in integrating heterogeneous data sources (traditional/relational databases , data files, email messages, web pages, etc. ) q Can be used on a variety if devices including PCs, PDAs, smart mobile phones, etc. BUT M A I N L Y. . . q ”Helps companies to cut costs in information exchange” CS 315 B – Domain-specific Languages for Parallelism
q Differences XML Relational Data Model Tree Table Data and schemas should not be correlated. Data can exist with or without schema, or with multiple schemas. Schema first, then data q Commonalities XML Logical and Physical Data Independence Declarative Semantics CS 315 B – Domain-specific Languages for Parallelism
A WS is a class on the Web. Like an RPC, which identified by a URI (e. g. http: //my. service: 234) accepts as argument an XML envelope returns an XML response. Client XML Web Service Server Application Logic CS 315 B – Domain-specific Languages for Parallelism XML
Typical Architecture Web Service (XML Domain) Server Application Logic (Java /. NET) Client XML XQuery (XML Domain) XML Web Service Server Application Logic XML DB CS 315 B – Domain-specific Languages for Parallelism
q XQuery is a declarative programming language, designed to manipulate and query XML data. q With XQuery you describe ’’what’’ you want to achieve and leave the ’’how’’ to the runtime system q It is essentially designed for optimizability, including automatic parallelization of the execution of the queries CS 315 B – Domain-specific Languages for Parallelism
<book> <ISBN> 333 </ISBN> <title> RDBMS </title> <author> Paul </author> <chapter> <num> I </num> <title> Information Retrieval using RDBMS </title> <section> <title> Beyond Simple Information Retrieval </title> <section> <title> Extension of RDBMS features </title> </section> </chapter> </book> CS 315 B – Domain-specific Languages for Parallelism
q Syntactic sugar that combines FOR, LET, IF RETURN expr FOR var IN expr LET var : = expr q WHERE expr Example Return the number of title elements of the chapter ”I” of the book XQUERY SQL Analogy FOR $chapters IN /book//chapter LET $titles : = $chapters//title WHERE $chapters/num = ”I” RETURN <Num. Of. Titles> count($titles) </Num. Of. Titles> similar to FROM no analogy in SQL similar to WHERE similar to SELECT CS 315 B – Domain-specific Languages for Parallelism
(doc. Id, s. Pos, e. Pos, level) doc. Id: identifier of the document s. Pos : starting position of the element or string within the XML doc e. Pos : end position of the element (for string => same as s. Pos) level : nesting depth within the document CS 315 B – Domain-specific Languages for Parallelism
To facilitate the evaluation of the XQuery expressions, an index is created for all the nodes within the XML database. Term doc. Id s. Pos e. Pos level Example: Suppose we have the containment query : ”chapter//title” book 1 1 36 0 ISBN 1 2 4 1 title 1 5 7 1 chapter 1 11 35 1 title 1 15 20 2 title 1 22 26 3 title 1 28 32 4 RDBMS 1 6 6 2 RDBMS 1 19 19 3 RDBMS 1 30 30 3 • Search the table for all entries in which term= ”chapter” Þ { (1, 11, 35, 1) } • Search the table for all entries in which term= ”title” Þ { (1, 5, 7, 1) , (1, 15, 20, 2), (1, 22, 26, 3), (1, 28, 32, 4) } • Combine them! { <(1, 11, 35, 1) , (1, 5, 7, 1) > , <(1, 11, 35, 1) , (1, 15, 20, 2)>, <(1, 11, 35, 1) , (1, 22, 26, 3)>, <(1, 11, 35, 1) , (1, 28, 32, 4) > } CS 315 B – Domain-specific Languages for Parallelism
q ”Beowulf cluster”: An example of a high performance parallel computing system used for parallel processing of XML Queries q Several processing nodes interconnected via a switch q Each node has its own CPU with a sizable cache, a large main memory (typically>1 GB) and a hd CS 315 B – Domain-specific Languages for Parallelism
q Master: runs the file. Serves as the point system for the clustering S/W to route duties and monitor all individual nodes (i. e. , slaves) q Beowulf: Ø Open source s/w like Linux Ø MPI library for broadcasting and point-to-point messages among the cluster’s nodes. CS 315 B – Domain-specific Languages for Parallelism
q Phase 1: Distribute the entries of the fullyinverted index among the cluster nodes for processing (e. g. , round-robin distribution, hash -based distribution). q Phase 2: Each cluster processes the containment query to generate the corresponding lists of index entries. q Phase 3: The elements of the generated list are checked against one another to produce the result set. CS 315 B – Domain-specific Languages for Parallelism
q Despite of XQuery we still need Java/. NET to: implement user interfaces call Web services; interact with other programs expose functions as Web service write complex applications q Trade-off between optimizability (on one side) & flexibility, determinism and expressive power (on the other side) Query languages are more optimizable but pay a price on the other side Imperative languages lack optimizability but the semantics are simpler, deterministic and richer CS 315 B – Domain-specific Languages for Parallelism
The ultimate goal: get rid of Java => all XQuery. P: Extension of XQuery for scripting Web Service (XML Domain) Server Application Logic (Java /. NET) Client XML XQuery (XML Domain) XML Web Service Server Application Logic XML DB CS 315 B – Domain-specific Languages for Parallelism
q Prototype in Big Oracle. DB Presented q Prototype Might at Plan-X 2005 in Berkeley. DB-XML be open sourced (if interest) q MXQuery http: //www. mxquery. org (Java) Runs on mobile phones: Java CLDC 1. 1; some cuts even run CLDC 1. 0 Eclipse Plugin available since March 2007 q Zorba Small C++ engine (FLWOR Foundation) footprint, performance, extensibility, potentially embeddable in many contexts CS 315 B – Domain-specific Languages for Parallelism
Ghassan Z. Qadah: ”Parallel processing of xml databases” [2005 IEEE CCECE/CCGEI] q Xiaogang Li, Swarup Kumar Sahoo, Gagan Agrawal: ”XQuery Perspective: Using XML/XQuery for Scientific Applications and Applying Scientific Compilation. Techniques” [2004 SIGMOD] q Daniela Florescu, Donald Kossmann. ”CS 345 B: XML and Databases”. http: //www. stanford. edu/class/cs 345 b/ q W 3 C XML Query XQuery http: //www. w 3. org/XML/Query/ q CS 315 B – Domain-specific Languages for Parallelism
CS 315 B – Domain-specific Languages for Parallelism
Introduces parts of code that will: Run in Sequential Mode Define the order in which expressions will be evaluated Be strictly deterministic Manually handle exceptions CS 315 B
Health. Care Level Seven http: //www. hl 7. org/ Geography Markup Language (GML) Systems Biology Markup Language (SBML) http: //sbml. org/ q XBRL, the XML based Business Reporting standard http: //www. xbrl. org/ q Global Justice XML Data Model (GJXDM) http: //it. ojp. gov/jxdm q eb. XML http: //www. ebxml. org/ q e. g. Encoded Archival Description Application http: //lcweb. loc. gov/ead/ q Digital photography metadata XMP q An XML grammar for sensor data (Sensor. ML) q Real Simple Syndication (RSS 2. 0) q q q CS 315 B – Domain-specific Languages for Parallelism
XQuery 1. 0 XSLT 2. 0 extends FLWOR expressions Node constructors Validation uses Xpath 2. 0 2007 extends, almost backwards compatible Xpath 1. 0 uses 1999 XSLT 1. 0 CS 315 B – Domain-specific Languages for Parallelism
1. Allow to execute sub-computations in a different order Parallelization, rescheduling 2. 3. 4. 5. Possible to use various data access paths Allow lazy evaluation Allow streaming/pipelining between operations (no materialization of intermediate results) Allow various evaluation algorithms for the same logical operation CS 315 B – Domain-specific Languages for Parallelism
- Slides: 24