XML Processing Moves Forward XSLT 2 0 and

  • Slides: 21
Download presentation
XML Processing Moves Forward XSLT 2. 0 and XQuery 1. 0 Michael Kay Prague

XML Processing Moves Forward XSLT 2. 0 and XQuery 1. 0 Michael Kay Prague 2005

About me • • • Database background Started using XML in 1998 for content

About me • • • Database background Started using XML in 1998 for content management applications Author of XSLT Programmer’s Reference Developer of Saxon XSLT processor Member of W 3 C XSL and XQuery Working Groups Founded SAXONICA March 2004 2

Contents • • A tour of the new specs What’s significant about XSLT 2.

Contents • • A tour of the new specs What’s significant about XSLT 2. 0 A quick demo Why XQuery? 3

The QT Specification Family XSLT 2. 0 XQuery 1. 0 XPath 2. 0 Functions

The QT Specification Family XSLT 2. 0 XQuery 1. 0 XPath 2. 0 Functions and Operators Data Model XML Schema 4

Standards maturity XML Schema Maturity XSLT 1. 0 XPath 1. 0 XML XQuery XSLT

Standards maturity XML Schema Maturity XSLT 1. 0 XPath 1. 0 XML XQuery XSLT 2. 0 XPath 2. 0 REC CR Time 5

A family of standards XQuery 1. 0 XSLT 2. 0 XPath 2. 0 XSLT

A family of standards XQuery 1. 0 XSLT 2. 0 XPath 2. 0 XSLT 1. 0 XPath 1. 0 XML Schema 6

XSLT and XQuery Documents Data XSLT XQuery 7

XSLT and XQuery Documents Data XSLT XQuery 7

What’s new in XSLT 2. 0 • New Processing Model • Major Features –

What’s new in XSLT 2. 0 • New Processing Model • Major Features – grouping – regular expressions – functions – schema support • Many “minor” features 8

Some “minor” features XSLT 2. 0 XPath 2. 0 • • • • •

Some “minor” features XSLT 2. 0 XPath 2. 0 • • • • • Temporary trees Multiple Output Files Format date/time Tunnel parameters Declared variable types Multi-mode templates xsl: next-match conditional compilation XHTML serialization xsl: namespace separator=“, ” character maps Sequences if. . then. . else for $x in X return f($x) some/every except/intersect $n is $m Function library • • • String functions Regex functions Date/time arithmetic URI handling min(), max(), avg() 9

Handling unstructured text • unparsed-text() function – reads a text file into a string

Handling unstructured text • unparsed-text() function – reads a text file into a string • tokenize() function – splits a string into substrings • xsl: analyze-string – parses a string and generates markup 10

Regular expression functions • matches() test if a string matches a regex if (matches($in,

Regular expression functions • matches() test if a string matches a regex if (matches($in, ‘[A-Z]{3}[0 -9]{3}’) • tokenize() split a string into substrings regex matches the separator for $s in tokenize($in, ‘, s? ’). . . • replace() replace every occurrence of a match replace($in, ‘s’, ‘%20’) 11

Grouping • Takes any sequence as input • Divides the items into groups •

Grouping • Takes any sequence as input • Divides the items into groups • Applies processing to each group-by: items with a common value for a grouping key group-adjacent: adjacent items with a common grouping key group-starting-with: pattern to match first item in each group-ending-with: pattern to match last item in each group 12

Grouping by Value <xsl: for-each-group select=“book” group-by=“publisher”> <xsl: sort select=“current-grouping-key()”/> <h 2>Publisher: <xsl: value-of

Grouping by Value <xsl: for-each-group select=“book” group-by=“publisher”> <xsl: sort select=“current-grouping-key()”/> <h 2>Publisher: <xsl: value-of select=“current-grouping-key”/> </h 2> <xsl: for-each select=“current-group()”/> <xsl: sort select=“title”/> <p>author: <xsl: value-of select=“author”/></p> <p>title: <xsl: value-of select=“title”/></p> </xsl: for-each-group> 13

User-defined Functions • Written like named templates • Called from XPath • Return a

User-defined Functions • Written like named templates • Called from XPath • Return a result <xsl: function name=“ged: date-to-ISO” as=“xs: date”> <xsl: param name=“in” as=“ged: date”/> <xsl: sequence select=“xs: date(concat( substring($in, 8, 4), ‘-’ format-number(index-of((“JAN”, “FEB”, . . . ), substring($in, 4, 3)), ’ 00’), ‘-’, substring($in, 1, 2)))”/> </xsl: function> <xsl: sort select=“ged: date-to-ISO(@birth-date)”/> 14

XQuery 1. 0 • Designed to query XML databases • Also handles in-memory transformations

XQuery 1. 0 • Designed to query XML databases • Also handles in-memory transformations • Well supported by database vendors 15

XQuery Example Join two tables xquery version 1. 0; <results> { for $p in

XQuery Example Join two tables xquery version 1. 0; <results> { for $p in doc ("auction. xml")/site/people/person let $a : = for $t in doc("auction. xml") /site/closed_auctions/closed_auction where $t/buyer/@person = $p/@id return $t return <item person="{$p/name}"> {count ($a)} </item>} </results> XMark Q 8 16

XSLT Equivalent <result xsl: version="1. 0" xmlns: xsl="http: //www. w 3. org/1999/XSL/Transform"> <xsl: for-each

XSLT Equivalent <result xsl: version="1. 0" xmlns: xsl="http: //www. w 3. org/1999/XSL/Transform"> <xsl: for-each select="/site/people/person"> <xsl: variable name="a" select="/site/closed_auctions/closed_auction [buyer/@person = current()/@id]"/> <item person="{name}"> <xsl: value-of select="count($a)"/> </item> </xsl: for-each> </result> XMark Q 8 17

Optimization • With multi-GB databases, using indexes is essential • XQuery does not have

Optimization • With multi-GB databases, using indexes is essential • XQuery does not have template rules • This makes it possible to do static analysis and join optimization 18

XMark Q 8 results (msecs) XSLT Xalan xt MSXML Saxon 8. 4 XQuery Saxon

XMark Q 8 results (msecs) XSLT Xalan xt MSXML Saxon 8. 4 XQuery Saxon 8. 4 Qizx Galax 1 Mb 1503 160 33 90 4 Mb 11006 2253 519 1340 10 Mb 65855 16414 4248 11126 136 351 1870 1575 711 6672 11947 1813 16625 O(n 2) O(n) 19

Two can play at that game! XSLT Xalan xt MSXML Saxon 8. 5 XQuery

Two can play at that game! XSLT Xalan xt MSXML Saxon 8. 5 XQuery Saxon 8. 5 Qizx Galax 1 Mb 1503 160 33 27 4 Mb 11006 2253 519 26 10 Mb 65855 16414 4248 45 16 351 1870 16 711 6672 31 1813 16625 O(n 2) O(n) caveat: this is one query only! 20

Conclusions • XSLT 2. 0 and XQuery 1. 0 are nearly ready • XSLT

Conclusions • XSLT 2. 0 and XQuery 1. 0 are nearly ready • XSLT 2. 0 has many powerful new features, making new applications possible • XQuery 1. 0 designed for optimization against very large databases 21