XPath By Laouina Marouane Outline Introduction Data Model
XPath By Laouina Marouane
Outline Introduction Ø Data Model Ø Expression Ø l l Patterns Location Paths Example Ø XPath 2. 0 Ø Practice Ø Conclusion Ø
What is XPath? Ø Ø Ø A scheme for locating documents and identifying substructures within them. A language designed to be used by both XSL Transformations (XSLT) and XPointer. Provides common syntax and semantics for functionality shared between XSLT and XPointer. Primary purpose: Address ‘parts’ of an XML document, and provide basic facilities for manipulation of strings, numbers and booleans. W 3 C Recommendation. November 16, 1999 Latest version: http: //www. w 3. org/TR/xpath
Why XPath? Ø Unique identifiers are not sufficient l l Assigning unique identifier to every element is a burden Identity of element may be unknown Identifiers cannot handle ranges of text May be inconvenient to identify a large number of objects by listing their identifiers
Introduction n n n n XPath uses a compact, string-based, rather than XML element-based syntax. Operates on the abstract, logical structure of an XML document (tree of nodes) rather than its surface syntax. Uses a path notation (like URLs) to navigate through this hierarchical tree structure, from which it got its name. A subset of it can be used for matching, i. e. testing whether or not a node matches a pattern. Models an XML document as a tree of nodes of types: element, attribute, text. Supports Namespaces. Name of a node (a pair consisting of a local part and namespace URI). Example of an XPath expression: /bib/book/publisher
Data Model Treats an XML document as a logical tree Ø This tree consists of 7 nodes: Ø u u Root Node – the root of the document not the document element Element Nodes – one for each element in the document èUnique ID’s u u u Ø Attribute Nodes Namespace Nodes Processing Instruction Nodes Comment Nodes Text Nodes The tree structure is ordered and reads from top to bottom and left to right
Data Model The root Processing instruction Comment bib book publisher Addison-Wesley The root element book author . . Serge Abiteboul
Example For this simple doc: <doc> <? Pub Caret? > <para>Some <em>emphasis</em> here. </para> <para>Some more stuff. </para> </doc> Might be represented as: root <doc> <? Pub Caret? > text <para> < em> text <para> text
Expressions Ø Ø Ø 1. 2. 3. 4. A text string to select an element, attribute, processing instructions, or text The primary syntactic construct in XPath. An expression is evaluated to yield an object, which has one of the following four basic types: node-set (an unordered collection of nodes without duplicates) boolean (true or false) number (a floating-point number) string (a sequence of UCS characters)
Element Context Ø Meaning of element can depend upon its context l <book><title>…</title></book> <person><title>…</title></person> Ø Want to search for, e. g. title of book, not title of person l XPath exploits sequential and hierarchical context of XML to specify elements by their context (i. e. location in hierarchy) • title book/title person/title
Context Ø Ø 1. 2. 3. 4. 5. Expression evaluation occurs with respect to a context. The context consists of: a node (the context node) a pair of non-zero positive integers (the context position and the context size) a set of variable bindings a function library the set of namespace declarations in scope for the expression
More on context types The context position is always less than or equal to the context size Ø The variable bindings consist of a mapping from variable names to variable values Ø The function library consists of a mapping from function names to functions. Each function takes zero or more arguments and returns a single result Ø The namespace declarations consist of a mapping from prefixes to namespace URIs Ø
Patterns Ø A pattern is an expression used not to find objects, but to establish if a specific object matches certain criteria Ø Very important in XSLT specification Ø The '|' symbol is used to specify alternative patterns for matching l note|warning|/book/intro
Location Paths n n n One important kind of expression is a location path (special case of expr) The result of evaluating an expression that is a location path is the node-set containing the nodes selected by the location path Location paths can recursively contain expressions that are used to filter sets of nodes Location. Path (most important construct) describes a path from 1 point to another. l Analogy: Set of street directions. “Second store on the left after the third light” Two types of paths: Relative & Absolute Composed of a series of steps (1 or more) and optional predicates
Relative Paths A relative location path consists of a sequence of one or more location steps separated by / Ø Each node in that set is used as a context node for the following step Ø E. g. para will select children of the current node that are of name 'para' Ø • <chapter> <title>…</title> <para>…</para> <note> </chapter> Ø //Current node //Selected //Not selected until note Verbose expression is child: : para
Absolute Paths Ø An absolute location path consists of / optionally followed by a relative location path Ø A / by itself selects the root node of the document containing the context node
Location Steps Ø A location step has three parts: 1. an axis, which specifies the tree relationship between the nodes selected by the location step and the context node, 2. a node test, which specifies the node type and expanded-name of the nodes selected by the location step, and 3. zero or more predicates, which use arbitrary expressions to further refine the set of nodes selected by the location step.
Location Steps parts explained Axes u 13 axes defined in XPath èAncestor, ancestor-or-self èAttribute èChild èDescendant, descendant-or-self èFollowing èPreceding èFollowing-sibling, preceding-sibling èNamespace èParent èSelf n Node test u Identifies type of node. Evaluates to true/false u Can be a name or function to evaluate/verify type n Predicate u XPath boolean expressions in square brackets following the basis(axis & node test) n
Location Steps in syntax Ø The syntax for a location step is the axis name and node test separated by a double colon, followed by zero or more expressions each in square brackets. Ø For example, in child: : para[position()=1], child is the name of the axis, para is the node test and [position()=1] is a predicate
Abbreviated Syntax Ø child: : can be omitted from a location step. (child is the default axis) div/para is equivalent to child: : div/child: : para Ø attribute: : can be abbreviated to @ Ø // is short for /descendant-or-self: : node()/ Ø A location step of. is short for self: : node() ex: . //para is short for self: : node()/descendant-or-self: : node()/child: : para Ø Location step of. . is short for parent: : node()
Wildcards Ø Sometimes don't or can't know names l Can use wildcard '*' for any single element • book/intro/title and book/chapter/title are matched by book/*/title (but so is book/appendix/title) l l Verbose child: : * Multiple asterisks can match several levels • But must know exact level and that inappropriate matches won't be made
Descendants Ø Rather than use wildcard - Recursively search through descendants l chapter//para will go through chapter hierarchy and select any para elements • <chapter> //Starting node <title>…</title> <para>…</para> //Selected <note> </chapter> Ø child: : chapter/descendant-or-self: : node()/child: : para
Ancestors Ø To signify parent of context element l l '. . ' parent() Ø To find all 'title' elements that share parent of context node l l . . /title parent: : node()/child: : title
Other Relationships Ø May move around siblings of current context element l l preceding-sibling: : following-sibling: : preceding-sibling: : parent: : following-sibling: : child: :
Other Relationships (2) Ø Can access all ancestors and descendants of current context element l l ancestor: : descendant: : Ø These methods don't select siblings descendant: : ancestor: :
Other Relationships (3) Ø Can access all ancestors and descendants of current context element l l ancestor-or-self: : descendant-or-self: : Ø These methods don't select siblings descendant-or-self: : ancestor-or-self: :
Other Relationships (4) Ø Can access all preceding and following completed nodes of current context element l l preceding: : following: : Ø Can access attributes l attribute: : preceding: : attribute: : following: :
Predicate Filters Ø Location paths are indiscriminate l May get a list of items that are selected Ø Predicate filter is used to filter the list l Filter is held between '[ ]' Ø Simplest is position() function predicate l l exon[position() = 1] //1 st exon intron[2] //2 nd intron Ø Can combine tests with 'and' and 'or'
Position Tests Ø The last() operation l Locates the last sibling in list Ø The count() operation l Evaluates the number of items in list l child: : transcript[count(child: : intron) = 1] Ø The id() operation l Checks the identifier of the element l child: : transcript[id("ENS 0001")]
Attribute Tests Ø Attributes can be selected l feature/@type Ø Elements can be selected dependant upon attribute value l feature[@type="exon"]
Functions in XPath: text() = matches the text value l node() = matches any node (= * or @* or text()) l name() = returns the name of the current tag l
Booleans Ø A boolean can only have two values: true or false Ø The following expressions can be evaluated: l l or and =, != <=, <, >=, >
Example Ø Operations perform boolean tests on conditions l l l exon[not(position() = 1)] transcript[not(exon)] intron[position != last()] exon[position > 2] exon[position >= 3] exon[position() = 1 or last()]
Numbers Ø A number represents a floating-point number Ø The numeric operators convert their operands to numbers Ø Operators include: l l l +, -, *, div, mod Since XML allows - in names, the - operator typically needs to be preceded by whitespace Example: 5 mod 2 returns 1
Strings Ø Strings consist of a sequence of zero or more character Ø A character is defined in the XML Recommendation
Example Ø Strings can be tested for characters and substrings l <note>hello there</note> • note[contains(text(), "hello")] l <note><b>hello</b> there</note> • note[contains(. , "hello")] l The '. ' is current node, and will go through all children
Example (2) Ø starts-with(string, pattern) l Ø note[starts-with(. , "hello")] string(exp) l note[contains(string(2))] string-after(string, terminator) Ø string-before(string, terminator) Ø substring(string, offset, length) Ø
Example (3) Ø normalize(string) Removes trailing and leading whitespace Ø translate(string, source, replace) l translate(. , "; +", ", ") Ø concat(strings) Ø string-length(string) l
Core Function Library XPath defines a core set of functions and operators All implementations of Xpath must implement the core function library Ø Node Set Functions Ø Ø list/item[position() mod 2 = 1] selects all odd number element of a list id)(“foo”)/child: : para[position()=5] selects the 5 th para child of the element with the unique ID foo Ø String Functions substring(“ 12345”, 0, 3) returns “ 12” Ø Boolean Functions boolean true() returns “true” Ø Number Functions number sum(node-set) returns the sum of the nodes
Example for XPath Queries <bib> <book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year> </book> <book price=“ 55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year> </book> </bib>
Example summary bib matches a bib element * matches any element / matches the root element /bib matches a bib element under root bib/paper matches a paper in bib//paper matches a paper in bib, at any depth //paper matches a paper at any depth paper|book matches a paper or a book @price matches a price attribute bib/book/@price matches price attribute in book, in bib/book/[@price<“ 55”]/author/lastname matches…
XPath 2. 0 Ø Latest version: l http: //www. w 3. org/TR/xpath 20/ Ø W 3 C Working Draft 22 August 2003 Ø Any expression that is syntactically valid and executes successfully in both XPath 2. 0 and XQuery 1. 0 will return the same result in both languages
XPath 2. 0 (2) XPath 2. 0 is a much more powerful language that operates on a much larger domain of data types Ø A better way of describing XPath 2. 0 is as an expression language for processing sequences, with built-in support for querying XML documents Ø driving forces behind XPath 2. 0 include not only the XPath 2. 0 Requirements document but also many of the XML Query language requirements. Ø XPath 2. 0 is a strict syntactic subset of XQuery 1. 0 Ø
XPath 2. 0 (3) Ø XPath 2. 0 introduces support for the XML Schema primitive types, which immediately gives the user access to 19 simple types, including dates, years, months, URIs, etc. Ø In addition, a number of functions and operators are provided for processing and constructing these different data types
XPath 2. 0 (4) Everything is a sequences are ordered In XPath 1. 0, if you wanted to process a collection of nodes, you had to deal with nodesets. Ø In XPath 2. 0, the concept of the node-set has been generalized and extended. Ø sequences may contain simple-typed values as well as nodes Ø “for” expression enables iteration over sequences Ø Ø Ø
XPath 2. 0 (5) sum(for $x in /order/item return $x/price * $x/quantity) Ø Conditional expression: Ø if ($widget 1/unit-cost < $widget 2/unit-cost) Ø then $widget 1 Ø else $widget 2 Ø Quantifiers: Ø some $x in /students/student/name satisfies $x = "Fred“ Ø every $x in /students/student/name satisfies $x = "Fred" Ø
XPath 2. 0 (6) Ø Intersections, differences, unions: Ø The except operator to select all of a given node-set, except for certain nodes Ø @* except @exc: foo Ø the intersect operator Ø $x intersect /foo/bar
Some Practice Try XPath Visualizer. Ø You can download it from: http: //www. vbxml. com/downloads/files/xpathvisualisersepte mber. zip Ø It can help you with: Ø Learning and playing with XPath expressions. Ø Composing and visually verifying the exact XPath expression when designing an XSLT stylesheet. Ø Obtaining the quantitative characteristics of an xml document, counts, sums, arithmetical and relational results, strings, substrings, etc. Ø
Conclusion n XPath provides a concise and intuitive way to address into XML documents n Standard part of the XSLT and XPointer specifications n Implementing XPath basically requires learning the abbreviated syntax of location path expressions and the functions of the core library
References Ø http: //www. w 3. org/TR/xpath 20/ Ø http: //www. vbxml. com/xpathvisualizer/defa ult. asp Ø http: //www. xml. com/pub/a/2002/03/20/xpat h 2. html Ø XML in a Nutshell
- Slides: 50