DB 2 Version 9 Overview of pure XML

DB 2 Version 9: Overview of pure. XML Plus an XPath Primer, XML Column Indexes, Support for Validation Gregg Lippa Themis Inc. http: //www. themisinc. com glippa@themisinc. com © 2009 Themis, Inc. All rights reserved.

• Gregg Lippa is currently a Senior Technical Advisor at Themis Inc. He teaches DB 2 -related courses on SQL, Application Programming, and optimization, performance and tuning as well as Java, J 2 EE, distributed computing and Java-related topics. Gregg has worked with DB 2 as a consultant and trainer for over 20 years and with Java for 10 years. • This material is taken from the Themis course DB 1091: DB 2 9 for z/OS pure. XML Features. For more information visit http: //www. themisinc. com. • Products and company names mentioned may be trademarks of their respective companies. Mention of third party products or software is for reference only and constitutes neither a recommendation nor an endorsement. © 2009 Themis, Inc. All rights reserved.

DB 2 Version 9 • pure. XML technology – Seamlessly integrating XML with relational data • Including publishing and parsing functions – Storing XML data natively in a database table • Including a new XML data type © 2009 Themis, Inc. All rights reserved.

An XML Document <patient id="13579"> <name>John Doe</name> <addr> <street>123 Main</street> <city>Anytown</city> <state>PA</state> <zip>19134</zip> </addr> <phone type="home">610 -654 -1234</phone> <phone type="work">610 -987 -4321</phone> <email>jdoe@mymail. com</email> <service> <sdate>2008 -09 -22</sdate> <reason>flu shot</reason> <descrip>gave flu shot</descrip> <cost>40. 00</cost> <copay>10. 00</copay> </service> <sdate>2008 -05 -11</sdate> <reason>sore foot</reason> <descrip>referred to specialist</descrip> <cost>60. 00</cost> <copay>10. 00</copay> </service> </patient> © 2009 Themis, Inc. All rights reserved. Root Nodes Elements Attributes Values (text) Well-formed Valid

XML – Why? • • Pervasive Versatile Self-describing Neutral for exchanging data among diverse devices – Universal standard for data interchange • Growth of XML data – XML-based industry and data standards – SOA and Web services; Services-based frameworks • messages are encapsulated as XML – Web 2. 0 technologies • XML feeds • Syndication services - rendered as XML files • XML data becoming more critical to enterprise operations © 2009 Themis, Inc. All rights reserved.

Previous Approaches to XML Document Storage • File systems – Not storing XML documents in a database – Does not scale well • Stuffing – XML data stored as large objects or VARCHAR – Inefficient for querying the XML data • Shredding – Decomposing XML data into multiple columns & tables – Often leads to complex join requirements – May be difficult to recreate original XML document • Utilizing XML-only database systems – Few options and little expertise available © 2009 Themis, Inc. All rights reserved.

XML vs Relational Model • Major differences: – XML data is hierarchical; relational data is tabular – XML data is self-describing; relational data is not – XML data is ordered; relational data is not • Which approach is right for my data? – XML maximizes flexibility of the data structure – Relational data provides performance benefits for data retrieval – Relational data supports referential integrity requirements • Referential constraints cannot be based on XML columns – Data warehouses are oriented toward relational data © 2009 Themis, Inc. All rights reserved.

pure. XML Capabilities • XML data type – With specialized hierarchical storage structure • Indexing capabilities – Based on data within XML documents • New query languages (XPath and SQL/XML) – New query optimization techniques too • XML schema support – Including validation • Database utilities support • Integration with JDBC, ODBC, Embedded SQL • XML shredding and publishing facilities – For composing and decomposing XML documents © 2009 Themis, Inc. All rights reserved.

Benefits of DB 2 pure. XML Technology • Faster development – Code simplification – Avoiding XML-relational transformations • Increased agility – Versatile XML schema evolution • Quickly modify applications to support new or changing requirements • Improved usability – Exploit previously unmanaged XML data – Speed up query processing through XML-optimized storage and indexing © 2009 Themis, Inc. All rights reserved.

pure. XML Usage Scenarios • Integration of diverse data sources – DB 2 provides the ability to join XML documents • Form processing – Store whole electronic forms in DB 2 rather than shredding • Document storage and querying – Store and manage less structured (document-centric) XML data in DB 2 • Using XML to support transactions – Service-oriented architectures (SOA) offer message-based transactions – XML data can then be retrieved, updated, searched analyzed • Syndication and XML feeds – Serve XML feeds through a Web services interface – Provide a repository for XML data to support these feeds © 2009 Themis, Inc. All rights reserved.

DB 2 Support for XML 0 – XML Storage DB 2 ENGINE 1 – Bind in XML 7 2 – Store as XML 0 1 Application that makes use of XML 3 – Shred into Relational 2 4 – Retrieve XML data 4 Textual XML 3 6 8 9 5 Relational XSR (validation) 5 – Publish as XML 6 – Bind XML output 7 – XML to XML 8 – XML to Relational 9 – Relational to XML © 2009 Themis, Inc. All rights reserved.

pure. XML Architecture • DB 2 9 – a hybrid database system – One database with both relational and native XML data – Single hybrid database engine handles all processing • Application may combine SQL and SQL/XML – Access relational and XML data in a hybrid database • XML data is stored separately from other table contents – Has its own table space • Supports XML document validation with XML schemas – XML schemas used for validation are registered with DB 2 • XML Schema Repository (XSR) © 2009 Themis, Inc. All rights reserved.

XML Data Type • DB 2 native XML support includes a new XML data type – An XML column holds one XML document for each row • XML data is stored in a parsed tree structure • XML document can also be stuffed into LOB or VARCHAR – Provides advantages in certain scenarios • XML storage requirements – Separate XML tablespace plus space for any needed indexes © 2009 Themis, Inc. All rights reserved.

The XML Data Type • Optimized storage – New XML data type • Supports insert, update, and delete • Stores parsed XML documents • Available when creating or altering tables CREATE TABLE PATIENT (PATIENTID CHAR(6), PATIENT_XML XML) CREATE TABLE P 2 (PID INT, PTYP CHAR(8), PX 1 XML, PX 2 XML) – Supports access to nodes within XML document via XPath • Query optimization – CREATE INDEX supports specification of an xmlpattern © 2009 Themis, Inc. All rights reserved.

XML Indexes • Indexes are often used to improve query performance • Indexes on XML columns are supported in DB 2 9 – Uses an XML pattern (XPath) expression – Indexes paths and values in stored XML documents • XML index entries provide access to document nodes – Not limited to providing access to the beginning of a document – Index keys are created based on XML pattern expressions CREATE INDEX PATINDEX ON PATIENT(PATIENT_XML) GENERATE KEY USING XMLPATTERN '/patient/service/sdate' as SQL VARCHAR(10) © 2009 Themis, Inc. All rights reserved.

Application Development Support • Supports developing apps that include XML requirements – Language support: C/C++, Java, Assembler, Cobol, PL/I – API support: JDBC, DB 2 / ODBC, Embedded SQL, SQLJ – SQL/XML query support – DB 2 sample database enhancements • Universal DB 2 driver for JDBC enhanced to support XML – Provided extension XML type: com. ibm. db 2. DB 2 Xml import com. ibm. db 2. jcc. DB 2 Xml; DB 2 Xml xml 1 = (DB 2 Xml) rs. get. Object ("patient_xml"); String s = xml 1. get. DB 2 String(); Input. Stream is = xml 1. get. DB 2 XMLBinary. Stream("UTF-16"); © 2009 Themis, Inc. All rights reserved.

DB 2 and XPath Navigating Through the XML Tree © 2009 Themis, Inc. All rights reserved.

DB 2 XPath • XPath is an expression language – Designed by the World Wide Web Consortium (W 3 C) – Used to navigate XML documents – XPath expressions are similar to file path notations • DB 2 XPath can be used: – With the XMLQUERY SQL built-in function • To extract data from an XML column – With the XMLEXISTS SQL predicate • To evaluate data in an XML column – When creating an XML index • To determine the XML document nodes to be indexed '/patient/name' All name elements within patient elements © 2009 Themis, Inc. All rights reserved.

XPath Expressions • XPath expressions: the basic building block of Xpath • Types of expressions provided by DB 2 XPath: 1. Primary expressions: basic primitives of the language • Include literals, variable references, and function calls 2. Path expressions: locate nodes within a document tree • Include Node Tests and Filter Expressions 3. Arithmetic expressions: add, subtract, multiply, divide, modulus 4. Comparison expressions: compare two values 5. Logical expressions: use boolean logic • Anywhere an expression is expected, any kind of expression can be used – Operands of an expression are typically other expressions © 2009 Themis, Inc. All rights reserved.

XPath Data Model Key: Document node Comment node One. Emp. xml Element Node Sample comment Attribute node Employee Name Phone type= "home" 800 -555 -1234 Text node Phone type= "cell" Address Dept mgr= "bob" 212 -321 -4321 Sales Last. Name First. Name Street City State Zipcode Roger Rabbit 123 Main Smallville Wyoming 98765 © 2009 Themis, Inc. All rights reserved.

Types of XPath Nodes • Document node encapsulates an XML document – Parent of root element node • Element node encapsulates an XML element – Can have one parent and many children • Attribute node represents an XML attribute – Belongs to an element • Text node encapsulates XML character content – Elements may have these • Processing Instruction (PI) node – Encapsulates XML processing instruction • Comment node encapsulates an XML comment • Namespaces node is considered to be a node in XPath © 2009 Themis, Inc. All rights reserved.

XPath Nodes Processing Order Document node Element node 1 2, 5, 7 Comment node 9 Processing Instruction Text node 10 6, 8 Attribute node 4 Namespace node 3 3 4 th="http: // themis. com/test" <? xml version="1. 0"? > <th: course xmlns: th="http: //themis. com/test" th: format="instructor led"> <th: title>XML for DB 2</th: title> <th: descrip>Lots of information <!-- To be determined --> </th: descrip> <? our. Own. PIfor. Courseware? > </th: course> 1 2 5 course title 7 descrip name=format value=“instructor led” © 2009 Themis, Inc. All rights reserved. 10 target= our. Own. PIfor. Courseware 6 8 9 XML for DB 2 Lots of information To be determined

Path Expressions Overview • Path expressions navigate the XML tree structure to locate nodes • Navigation axes are used in XPath; always start at context node – – Forward. Axis moves down through the XML tree: • attribute (@) • child (default) • descendant • self (. ) • descendant-or-self (//=/descendant-or-self: node()/) Reverse. Axis moves up through the XML tree: • • Comment Element attribute Element Text parent (. . ) Three parts of an axis step: 1. Axis specifies a direction of movement 2. Node test specifies node selection criteria 3. Predicates (zero or more) filter returned sequence © 2009 Themis, Inc. All rights reserved. parent self Element children Text descendants

Path Expressions Syntax • XML document to support upcoming example code: <patient id="11123"> <name>Sara Lee</name> <addr> <street>33 Maple</street> <city>Nearly</city> <state>NJ</state> <zip>07123</zip> </addr> <phone type="home">908 -842 -7531</phone> <email>saralee@cakemail. com</email> <service> <sdate>2004 -02 -29</sdate> <reason>brittle nails</reason> <descrip>prescribed hormones</descrip> <cost>84. 00</cost> <copay>15. 00</copay> </service> </patient> • XPath expressions support abbreviated syntax in axis steps @ //. . . is abbreviated syntax for is abbreviated syntax for © 2009 Themis, Inc. All rights reserved. attribute /descendant-or-self: : node()/ parent: : node() self: : node() child: : node()

Path Expression Examples • ‘ / ’ slash indicates that path begins at root node – Whole XML document • ‘//service’ two slashes at beginning of path expression – Requested node, service, may be located anywhere in the document – Returns the entire service element, including all of its children • ‘//phone/@*’ All attributes appearing under the phone element – Only attribute under phone is: home • ‘//@*’ All attributes in the XML document – All attributes: 11123 home • ‘/patient/name’ All name elements under patient – Only name is Sara Lee • ‘/patient/addr/city/. . ’ All child elements of addr – Element addr, the parent of city, plus all child elements of addr © 2009 Themis, Inc. All rights reserved.

Filter Expressions • A filter expression is a path expression followed by predicates in square brackets – Filter its result based on applying conditions; for example: • All service information of patients who have a $15. 00 copay '/patient/*[copay="15. 00"]' <service> <sdate>2004 -02 -29</sdate> <reason>brittle nails</reason> <descrip>prescribe hormones</descrip> <cost>84. 00</cost> <copay>15. 00</copay> </service> • Set context node to patient and filter based on attribute type rather than on an element /patient/phone[@type="home"] <patient id="11123"> <name>Sara Lee</name> <addr> <street>33 Maple</street> <city>Nearly</city> <state>NJ</state> <zip>07123</zip> </addr> <phone type="home">908 -842 -7531</phone> <email>saralee@cakemail. com</email> <service> <sdate>2004 -02 -29</sdate> <phone type="home">908 -842 -7531</phone> • Any patient that has email /patient[email] <reason>brittle nails</reason> <descrip>prescribed hormones</descrip> <cost>84. 00</cost> <copay>15. 00</copay> </service> Entire XML document © 2009 Themis, Inc. All rights reserved. </patient>

Arithmetic Expressions • Arithmetic expressions: perform operations that involve addition, subtraction, multiplication, division, and modulus • The XPath arithmetic operators: * multiplication division idiv integer division modulus + addition - subtraction • An arithmetic expression results in a numeric value – Or an empty sequence or an error • Place arithmetic expressions in parentheses © 2009 Themis, Inc. All rights reserved.

Arithmetic Expression Example • An arithmetic expression to calculate the remaining balance after the copay (and its return value): /patient/service/(cost – copay) 69. 00 © 2009 Themis, Inc. All rights reserved. <patient id="11123"> <name>Sara Lee</name> <addr> <street>33 Maple</street> <city>Nearly</city> <state>NJ</state> <zip>07123</zip> </addr> <phone type="home">908 -842 -7531</phone> <email>saralee@cakemail. com</email> <service> <sdate>2004 -02 -29</sdate> <reason>brittle nails</reason> <descrip>prescribed hormones</descrip> <cost>84. 00</cost> <copay>15. 00</copay> </service> </patient>

Comparison Expressions – General Comparisons • Comparison expressions allow comparing two values – The comparison operators are = • != < <= >= All services with a cost greater than 60 '/patient/service/cost > 60' true • > All services with a cost greater than 60, but get the patient’s service info instead of just true or false '/patient/service[cost > 60]' <service> <sdate>2004 -02 -29</sdate> <reason>brittle nails</reason> <descrip>prescribed hormones </descrip> <cost>84. 00</cost> <copay>15. 00</copay> </service> © 2009 Themis, Inc. All rights reserved. <patient id="11123"> <name>Sara Lee</name> <addr> <street>33 Maple</street> <city>Nearly</city> <state>NJ</state> <zip>07123</zip> </addr> <phone type="home">908 -842 -7531 </phone> <email>saralee@cakemail. com </email> <service> <sdate>2004 -02 -29</sdate> <reason>brittle nails</reason> <descrip>prescribed hormones </descrip> <cost>84. 00</cost> <copay>15. 00</copay> </service> </patient>

Comparison Expressions – Logical Comparisons • Logical expressions using AND return true if both of two expressions are true • Logical expressions using OR return true if one or both expressions are true • Return phone numbers of the type work or fax (two options shown) '//phone[. /@type="work" or. /@type="fax"]' '//phone[@type="work" or @type="fax"]' • The result is this phone info <phone type="work">908 -842 -7531</phone> <phone type="fax">908 -751 -2468</phone> © 2009 Themis, Inc. All rights reserved. <patient id="11123"> <name>Sara Lee</name> <addr> <street>33 Maple</street> <city>Nearly</city> <state>NJ</state> <zip>07123</zip> </addr> <phone type="home">908 -842 -7531 </phone> <phone type="fax">908 -751 -2468 </phone> <email>saralee@cakemail. com </email> <service> <sdate>2004 -02 -29</sdate> <reason>brittle nails</reason> <descrip>prescribed hormones </descrip> <cost>84. 00</cost> <copay>15. 00</copay> </service> </patient>

The Built-in Function Library • Built-in functions offered by the DB 2 XPath library: – String functions – Numeric functions – Functions working on boolean values – Functions working on sequences • Calls to these function are allowed in an XPath expression anywhere an expression is expected • Example: SELECT XMLQUERY('fn: concat($x, $y)' PASSING 'come ' AS "x", 'together' AS "y") FROM SYSIBM. SYSDUMMY 1; © 2009 Themis, Inc. All rights reserved.

A Sampling of XPath Functions fn: compares two strings to see which one is greater fn: concatenates two or more strings into a single string fn: contains determines whether a string contains a given substring fn: count returns the number of values in a sequence fn: normalize-space strips leading and trailing whitespace characters fn: lower-case converts a string to lowercase fn: matches determines whether a string matches a given pattern fn: position returns the position of the context item in the sequence fn: replaces characters that match a pattern fn: round returns the integer that is closest to a numeric value fn: string returns the string representation of a value fn: string-length returns the length of a string fn: substring returns a substring of a string fn: upper-case converts a string to uppercase © 2009 Themis, Inc. All rights reserved.

XML Namespaces • XML namespaces prevent naming collisions – An XML namespace is a set of names identified by a namespace URI – Distinguishes element types or attribute names with the same name associated with different DTDs or Schemas • Namespaces allow qualifying names of elements and attributes – Contain an optional namespace prefix, a colon, and a local name • Example: two elements with the same name bound to different URIs: <an. Element xmlns: p 1="some. URI" xmlns: p 2="other. URI"> <Element. ABC> <p 1: table>excel</p 1: table> <p 2: table>dining</p 2: table> empty prefix; <table>DB 2</table> bound to default element </Element. ABC> namespace </an. Element> © 2009 Themis, Inc. All rights reserved.

Prolog – Namespace Definition • A DB 2 XPath expression optionally contains a prolog – Establishes the processing environment • Prolog declaration may specify multiple namespace declarations – May also specify one default namespace declaration • Prolog declaration is always followed by a semicolon (; ) – Syntax: declare namespace prefix="namespace string literal"; declare default element namespace "namespace string literal"; – Examples: declare namespace fn="http: //www. w 3. org/2005/xpath-functions"; declare default element namespace "http: //www. xyz. com/movies"; © 2009 Themis, Inc. All rights reserved.

Indexes and XML Indexes Built On Values Within XML Documents © 2009 Themis, Inc. All rights reserved.

XML Indexing • DB 2 supports creating indexes on XML columns – Generated using XML pattern expressions – Support access to nodes in the document • Multiple parts of an XML document can satisfy an XML pattern – Multiple index keys may be generated for insert of a single document • GENERATE KEY USING XMLPATTERN clause of CREATE INDEX – Specifies what you want to index – Contains XML pattern expression Same as before XML CREATE UNIQUE INDEX PATIENT_ID_IX ON PATIENT(PATIENT_XML) GENERATE KEY USING XMLPATTERN '/patient/@id' AS SQL VARCHAR(5) Required keywords Type of stored index values © 2009 Themis, Inc. All rights reserved. XML node to be indexed

Data Types Associated With Pattern Expressions • Keys from XML pattern expression specified in a CREATE INDEX statement must be associated with a data type – May use either DECFLOAT or VARCHAR(n), where n <= 1000 – Value being inserted/indexed must be convertible to this type INSERT INTO PATIENT VALUES('12345', '<patient id="123456"><name>Jim Beam</name></patient>') Value too long for VARCHAR(5) index DSNT 408 I SQLCODE = -20305, ERROR: AN XML VALUE CANNOT BE INSERTED OR UPDATED BECAUSE OF AN ERROR DETECTED WHEN INSERTING OR UPDATING THE INDEX IDENTIFIED BY 'DBID~132 OBID~23' ON TABLE *N. REASON CODE = 1. © 2009 Themis, Inc. All rights reserved.

UNIQUE Keyword in XML Index Definition • The UNIQUE keyword is supported in XML index definitions – However, its meaning is different than in relational index definitions • When creating a relational index, the UNIQUE keyword enforces uniqueness across all rows in the table • When creating an index over XML data, the UNIQUE keyword enforces uniqueness across all documents in an XML column © 2009 Themis, Inc. All rights reserved.

Example Queries and Supporting Indexes • Example 1 SELECT * FROM PATIENT WHERE XMLEXISTS('$Z/patient/service[copay="10. 00"]' PASSING BY REF PATIENT_XML AS "Z") Supporting index CREATE INDEX copay. Idx on PATIENT(PATIENT_XML) GENERATE KEY USING XMLPATTERN '/patient/service/copay' AS SQL DECFLOAT • Example 2 SELECT * FROM PATIENT WHERE XMLEXISTS('$Z/patient/phone[@type="work"]' PASSING BY REF PATIENT_XML AS "Z") Supporting index CREATE INDEX phone. Typ. Idx on PATIENT(PATIENT_XML) GENERATE KEY USING XMLPATTERN '/patient/phone/@type' AS SQL VARCHAR(20) • Example 3 SELECT * FROM PATIENT WHERE XMLEXISTS('$Z/patient/addr[city="Uptown"]' PASSING BY REF PATIENT_XML AS "Z") Supporting index CREATE INDEX city. Idx on PATIENT(PATIENT_XML) GENERATE KEY USING XMLPATTERN '/patient/addr/city' AS SQL VARCHAR(20) © 2009 Themis, Inc. All rights reserved.

XML Schemas, Validation and XML Schema Repository (XSR) © 2009 Themis, Inc. All rights reserved.

XML Schema • XML Schema Definition (XSD) defines structure of XML instance documents – – Published as a recommendation by W 3 C Defines elements and attributes permitted in a document Defines parent / child relationship between elements Defines data types, constraints and values for elements and attributes <xsd: schema xmlns: xsd="http: //www. w 3. org/2001/XMLSchema"> <xsd: element name="patient"> <xsd: complex. Type> <xsd: sequence> <xsd: element name="id" type="xsd: integer"/> <xsd: element name="name" type="xsd: string"/> <xsd: element name="addr" type="xsd: string"/> <xsd: element name="dob" type="xsd: date"/> <xsd: element name="amount" type="xsd: double"/> </xsd: sequence> </xsd: complex. Type> </xsd: element> </xsd: schema> © 2009 Themis, Inc. All rights reserved.

XML Schema Management With the XSR • XML schema repository (XSR) – Set of tables that store XML schemas – Created during DB 2 installation or migration Only schemas, and not DTDs, may be used for XML validation in DB 2 Version 9 • XML schemas may be added to the XSR – Then used to validate XML documents being inserted or updated • Registering XML schema documents (adding to XSR): – Call DB 2 -supplied stored procedures from a DB 2 application – Or invoke a provided JDBC method from a Java application • Removing XML schema documents from the DB 2 XSR – Call DB 2 -supplied stored procedure or invoke JDBC method © 2009 Themis, Inc. All rights reserved.

XML Schema Validation • Use SQL INSERT statement to insert data into XML column – Inserted data must be a well-formed XML document • Validate the XML against a registered XML schema during insertion using the DSN_XMLVALIDATE function – User Defined Function • XML validation determines whether the structure, content, and data types of an XML document are valid according to a corresponding schema • Validation is optional INSERT into Auto. Dealers VALUES( '12345', CURRENT DATE, 'Sams Deals', DSN_XMLValidate(: xml. Dealer. Info, SYSXSR. Dealer. Info. Schema)); © 2009 Themis, Inc. All rights reserved.

XML Decomposition • Decomposition, or shredding, is the process of storing XML document content in columns of relational tables – Decomposed data has SQL type of column where it is inserted • An XML schema consists of one or more XML schema documents • Annotated XML schema decomposition – Control of the decomposition process is provided by XML schema annotation © 2009 Themis, Inc. All rights reserved.

Resources • DB 2 Version 9. 1 for z/OS XML Guide (SC 18 -9858 -03) • DB 2 9 for z/OS Technical Overview (SG 24 -7330 -00) • DB 2 Version 9. 1 for z/OS Application Programming and SQL Guide (SC 18 -9841 -01) • DB 2 Version 9. 1 for z/OS Utility Guide and Reference (SC 18 -9855 -02) • DB 2 Version 9. 1 for z/OS Application Programming Guide and Reference for Java (SC 18 -9842 -01 ) © 2009 Themis, Inc. All rights reserved.
- Slides: 45