Tutorial Introduction to XML and Java XML dom

  • Slides: 40
Download presentation
Tutorial: Introduction to XML and Java: XML, dom 4 j and XPath Eran Toch

Tutorial: Introduction to XML and Java: XML, dom 4 j and XPath Eran Toch Methodologies in the Development of Information Systems December 2003 XML and Java: XML, dom 4 j and Xpath – Eran Toch Methodologies in Information System Development

Sources • Major Sources: – http: //www. cis. upenn. edu/~cis 550/slides/xml. ppt CIS 550

Sources • Major Sources: – http: //www. cis. upenn. edu/~cis 550/slides/xml. ppt CIS 550 Course Notes, U. Penn, source for many slides – http: //www. cs. technion. ac. il/~oshmu/ 236804 - Seminar in Computer Science 4: XML Technology, Systems and Theory – http: //dom 4 j. org XML and Java: XML, dom 4 j and Xpath – Eran Toch Methodologies in Information System Development 2

Agenda • Short Introduction to XML – What is XML – Structure and Terminology

Agenda • Short Introduction to XML – What is XML – Structure and Terminology – JAVA APIs for XML: an Overview • dom 4 j – Parsing an XML document – Writing to an XML document • Xpath – Xpath Queries – Xpath in dom 4 j • References XML and Java: XML, dom 4 j and Xpath – Eran Toch Methodologies in Information System Development 3

The Structure of XML • XML consists of tags and text • Tags come

The Structure of XML • XML consists of tags and text • Tags come in pairs <date>. . . </date> • They must be properly nested <date> <day>. . . </date> --- good <date> <day>. . . </date>. . . </day> --- bad XML and Java: XML, dom 4 j and Xpath – Eran Toch Methodologies in Information System Development 4

XML text • XML has only one “basic” type -- text. It is bounded

XML text • XML has only one “basic” type -- text. It is bounded by tags e. g. <title> The Big Sleep </title> <year> 1935 </ year> --- 1935 is still text • XML text is called PCDATA (for parsed character data). It uses a 16 -bit encoding, e. g. &#x 0152 for the Hebrew letter Mem Later we shall see how new types are specified by XML-data XML and Java: XML, dom 4 j and Xpath – Eran Toch Methodologies in Information System Development 5

XML structure • Nesting tags can be used to express various structures. E. g.

XML structure • Nesting tags can be used to express various structures. E. g. A tuple (record: ( <person> <name> Jeff Cohen</name> <tel> 04 -828 -1345 </tel> <tel> 054 -470 -778 </tel> <email> jeffc@cs. technion. ac. il </email> </person> XML and Java: XML, dom 4 j and Xpath – Eran Toch Methodologies in Information System Development 6

XML structure (cont(. • We can represent a list by using the same tag

XML structure (cont(. • We can represent a list by using the same tag repeatedly: <addresses> <person>. . . </person> <person>. . . </addresses> XML and Java: XML, dom 4 j and Xpath – Eran Toch Methodologies in Information System Development 7

XML structure (cont(. • Nested tags can be part of a list too: <addresses>

XML structure (cont(. • Nested tags can be part of a list too: <addresses> <person> <name> Yossi Orr</name> <tel> 04 -828 -1345 </tel> <email> yossio@cs. technion. ac. il </email> </person> <name> Irma Levy</name> <tel> 03 -426 -1142 </tel> <email>irmal@yourmail. com</email> </person> </addresses> XML and Java: XML, dom 4 j and Xpath – Eran Toch Methodologies in Information System Development 8

Terminology • The segment of an XML document between an opening and a corresponding

Terminology • The segment of an XML document between an opening and a corresponding closing tag is called an element. • Meta date about an element can appear in an attribute <person type=“Friend”> <name>Ortal Derech</name> <tel>04 -8732122</tel> element <tel>054 -646888</tel> <email>oderech@tx. technion. ac. il</email> </person> text element, a sub-element of XML and Java: XML, dom 4 j and Xpath – Eran Toch Methodologies in Information System Development 9

XML is tree-like person name tel email Malcolm Atchison (215) 898 4321 mp@dcs. gla.

XML is tree-like person name tel email Malcolm Atchison (215) 898 4321 mp@dcs. gla. ac. sc XML and Java: XML, dom 4 j and Xpath – Eran Toch Methodologies in Information System Development 10

A Complete XML Document <? XMLversion ="1. 0" encoding="UTF-8" standalone="no"? > <!DOCTYPE addresses SYSTEM

A Complete XML Document <? XMLversion ="1. 0" encoding="UTF-8" standalone="no"? > <!DOCTYPE addresses SYSTEM "http: //www. technion. ac. il/~erant/addresses. dtd"> Tells whether or not this document references an external entity or an external data type specification <addresses> <person> <name> Jeff Cohen</name> <tel> 04 -828 -1345 </tel> <tel> 054 -470 -778 </tel> <email> jeffc@cs. technion. ac. il </email> </person> </addresses> XML and Java: XML, dom 4 j and Xpath – Eran Toch Methodologies in Information System Development 11

XML Structure Definitions • DTD – Document Type Definition – defines structure constraints for

XML Structure Definitions • DTD – Document Type Definition – defines structure constraints for XML documents • XML Schema – Same as DTD, more powerful because it includes facilities to specify the data type of elements and it is based on XML. • Namespaces – Namespaces are a way of preventing name clashes among elements from more than one source within the same XML document. XML and Java: XML, dom 4 j and Xpath – Eran Toch Methodologies in Information System Development 12

More Standards • Xpath – XML Path Language, a language for locating parts of

More Standards • Xpath – XML Path Language, a language for locating parts of an XML document. • Xquery – A query language for XML documents (like SQL…). • XSLT – XSL Transformations, a language for transforming XML documents into other XML documents. • RDF – Resource Description Framework. A formal knowledge model from the World Wide Web. XML and Java: XML, dom 4 j and Xpath – Eran Toch Methodologies in Information System Development 13

Why Is XML Important? • Because it exists, and everybody uses it. • Plain

Why Is XML Important? • Because it exists, and everybody uses it. • Plain Text - you can create and edit files with anything. • Data Identification - XML tells you what kind of data you have, not how to display it. • Separation from style. • Hierarchical, and easily processed. XML and Java: XML, dom 4 j and Xpath – Eran Toch Methodologies in Information System Development 14

An Overview of the APIs • JAXP: Java API for XML Processing – It

An Overview of the APIs • JAXP: Java API for XML Processing – It provides a common interface for creating and using the standard SAX, DOM, and XSLT APIs. • JAXB: Java Architecture for XML Binding – defines a mechanism for writing out Java objects as XML. • JDOM – Represents an XML file as a tree of objects (sophisticated version of DOM) • dom 4 j – Lightweight version of JDOM. XML and Java: XML, dom 4 j and Xpath – Eran Toch Methodologies in Information System Development 15

Agenda • Introduction to XML – What is XML – Structure and Terminology –

Agenda • Introduction to XML – What is XML – Structure and Terminology – JAVA APIs for XML: an Overview • dom 4 j – Parsing an XML document – Writing to an XML document • Xpath – Xpath Queries – Xpath in dom 4 j • References XML and Java: XML, dom 4 j and Xpath – Eran Toch Methodologies in Information System Development 16

dom 4 j • An Open Source XML framework for Java. • Allows you

dom 4 j • An Open Source XML framework for Java. • Allows you to read, write, navigate, create and modify XML documents. • Integrates with DOM and SAX. • Full XPath support. • XSLT Support. XML and Java: XML, dom 4 j and Xpath – Eran Toch Methodologies in Information System Development 17

Download and Use • Go to: http: //dom 4 j. org. • Go to

Download and Use • Go to: http: //dom 4 j. org. • Go to http: //dom 4 j. org/download. html, and download the latest release (current = 1. 4. ( • Unzip. • Don’t forget the classpath. When working in an IDE, don’t forget to add the log 4 j. jar library. • Javadoc: http: //dom 4 j. org/apidocs/index. html. • Quick start guide: http: //dom 4 j. org/guide. html. XML and Java: XML, dom 4 j and Xpath – Eran Toch Methodologies in Information System Development 18

Opening an XML Document import org. dom 4 j. *; public class Foo {

Opening an XML Document import org. dom 4 j. *; public class Foo { public Document parse(String id) throws Document. Exception{ SAXReader reader = new SAXReader(); Document document = reader. read(id); return document; } } We can read: file, URL, Input. Stream, String XML and Java: XML, dom 4 j and Xpath – Eran Toch Methodologies in Information System Development 19

Example XML File <? xml version="1. 0" encoding="UTF-8" ? > <salesdata xmlns: xsi="http: //www.

Example XML File <? xml version="1. 0" encoding="UTF-8" ? > <salesdata xmlns: xsi="http: //www. w 3. org/2001/XMLSchema-instance" xsi: no. Namespace. Schema. Location="C: Documents and Settingseran My DocumentsAcademicCoursesXMLxpath_ass_schema. xsd"> <year> <theyear>1997</theyear> <region><name>central</name><sales unit="millions">34</sales></region> <region><name>east</name><sales unit="millions">34</sales></region> <region><name>west</name><sales unit="millions">32</sales></region> </year> <theyear>1998</theyear> <region><name>east</name><sales unit="millions">35</sales></region><name>west</name><sales unit="millions">42</sales> </region> </year> </salesdata> XML and Java: XML, dom 4 j and Xpath – Eran Toch Methodologies in Information System Development 20

Accessing XML Elements Accessing root element Retrieving child elements public void dump(Document document) throws

Accessing XML Elements Accessing root element Retrieving child elements public void dump(Document document) throws Document. Exception{ Element root = document. get. Root. Element(); for (Iterator i = root. element. Iterator(); i. has. Next(); ) { Element element = (Element)i. next(); System. out. println(element. get. Qualified. Name()); System. out. println(element. get. Text. Trim()); System. out. println(element. Text("theyear")); } } Retrieving element name XML and Java: XML, dom 4 j and Xpath – Eran Toch Methodologies in Information System Development Retrieving element text Retrieving the text of the child element “theyear” 21

Accessing XML Elements – cont’d • What will be the output of dump? ()

Accessing XML Elements – cont’d • What will be the output of dump? () year 1997 year 1998 Why? XML and Java: XML, dom 4 j and Xpath – Eran Toch Methodologies in Information System Development 22

Accessing XML Elements Recursively public void go(Element element, int depth){ for (int d=0; d<depth;

Accessing XML Elements Recursively public void go(Element element, int depth){ for (int d=0; d<depth; d++){ System. out. print(" "); } System. out. print(element. get. Qualified. Name()); System. out. println(" "+ element. get. Text. Trim()); for (Iterator i = element. Iterator(); i. has. Next(); ) { Element son = (Element)i. next(); go(son, depth+1); } } What will be the output? XML and Java: XML, dom 4 j and Xpath – Eran Toch Methodologies in Information System Development 23

Accessing Recursively – cont’d salesdata year theyear 1997 region name central sales 34 region

Accessing Recursively – cont’d salesdata year theyear 1997 region name central sales 34 region name east sales 34 region name west sales 32 year theyear 1998 region name east sales 35 region name west sales 42 XML and Java: XML, dom 4 j and Xpath – Eran Toch Methodologies in Information System Development The whole XML tree, element names + values 24

Creating an XML document Creating root element public Document create. Document() { Document document

Creating an XML document Creating root element public Document create. Document() { Document document = Document. Helper. create. Document(); Element root = document. add. Element("phonebook"); Element address 1 = root. add. Element("address"). add. Attribute("name", "Yuval"). add. Attribute("category", "family"). add. Text("Ehud 3, Jerusalem"); Element address 2 = root. add. Element("address"). add. Attribute("name", "Ortal"). add. Attribute("category", "friends"). add. Text("Kibbutz Givaat Haim"); return document; } What will we get when running go()? XML and Java: XML, dom 4 j and Xpath – Eran Toch Methodologies in Information System Development Adding elements 25

Creating an XML document – cont’d phonebook address Ehud 3, Jerusalem address Kibbutz Givaat

Creating an XML document – cont’d phonebook address Ehud 3, Jerusalem address Kibbutz Givaat Haim XML tree structure of the new document File. Writer out = new File. Writer("C: \addresses. xml"); document. write(out); String XML = document. as. XML() Retrieving the XML itself as string XML and Java: XML, dom 4 j and Xpath – Eran Toch Methodologies in Information System Development Writing the XML document to a file 26

Client Program public static void main(String[] args) { Foo foo = new Foo(); try{

Client Program public static void main(String[] args) { Foo foo = new Foo(); try{ Document doc = foo. parse("C: \Documents and Settings\eran\ My Documents\Academic\Courses\XML\sales. xml"); Opening the foo. dump(doc); file foo. go(doc. get. Root. Element(), 0); foo. xpath(doc); Document new. Doc = foo. create. Document(); foo. go(new. Doc. get. Root. Element(), 0); Dumping File. Writer out = new File. Writer( "C: \addresses. xml" ); and printed new. Doc. write(out); recursively } catch (Exception E){ System. out. println(E); Creating a } new } document XML and Java: XML, dom 4 j and Xpath – Eran Toch Methodologies in Information System Development 27

Agenda • Introduction to XML – What is XML – Structure and Terminology –

Agenda • Introduction to XML – What is XML – Structure and Terminology – JAVA APIs for XML: an Overview • dom 4 j – Parsing an XML document – Writing to an XML document • Xpath – Xpath Queries – Xpath in dom 4 j • References XML and Java: XML, dom 4 j and Xpath – Eran Toch Methodologies in Information System Development 28

Xpath - Introduction • XML Path Language. XPath is a language for addressing parts

Xpath - Introduction • XML Path Language. XPath is a language for addressing parts of an XML document. • Enables node locating and retrieving, very much like directory accessing in file systems. • Limited (but not bad) filtering and querying abilities. • Retrieved the actual PCDATA or node sets XML and Java: XML, dom 4 j and Xpath – Eran Toch Methodologies in Information System Development 29

Xpath – Simple Path Selection Xpath Expression: /salesdata/year/theyear <theyear>1997</theyear> <theyear>1998</theyear> “/” signifies child-of /salesdata/year[2]/theyear

Xpath – Simple Path Selection Xpath Expression: /salesdata/year/theyear <theyear>1997</theyear> <theyear>1998</theyear> “/” signifies child-of /salesdata/year[2]/theyear <theyear>1998</theyear> XML and Java: XML, dom 4 j and Xpath – Eran Toch Methodologies in Information System Development Filtering the level – getting only the second year element 30

Xpath – Conditions /salesdata/year/region[sales > 34] Going down to region, and filtering according to

Xpath – Conditions /salesdata/year/region[sales > 34] Going down to region, and filtering according to the sales element <region> <name>east</name> <sales unit="millions">35</sales> </region> <name>west</name> <sales unit="millions">42</sales> </region> /salesdata/year/region[sales > 34]/name ? XML and Java: XML, dom 4 j and Xpath – Eran Toch Methodologies in Information System Development 31

Xpath – Traveling Up the Tree /salesdata/year/region[sales > 34]/parent: : year/theyear <theyear>1998</theyear> Going up

Xpath – Traveling Up the Tree /salesdata/year/region[sales > 34]/parent: : year/theyear <theyear>1998</theyear> Going up the XML tree (and then down again) XML and Java: XML, dom 4 j and Xpath – Eran Toch Methodologies in Information System Development 32

Xpath – Traveling Down Fast /descendant: : sales <sales <sales unit="millions">34</sales> unit="millions">32</sales> unit="millions">35</sales> unit="millions">42</sales>

Xpath – Traveling Down Fast /descendant: : sales <sales <sales unit="millions">34</sales> unit="millions">32</sales> unit="millions">35</sales> unit="millions">42</sales> Going all the way down, until the sales element //sales Same same XML and Java: XML, dom 4 j and Xpath – Eran Toch Methodologies in Information System Development 33

Xpath – Advanced Queries • The years (text nodes) for which sales data exists:

Xpath – Advanced Queries • The years (text nodes) for which sales data exists: Logical operators ancestor is same as parent but goes all the way up to year //region[name="west" and sales > 32]/sales[@unit='millions']/ancestor: : year /theyear Accessing attributes <theyear>1998</theyear> XML and Java: XML, dom 4 j and Xpath – Eran Toch Methodologies in Information System Development 34

Xpath – Advanced Queries (cont’d( • The years (text nodes) in which the west

Xpath – Advanced Queries (cont’d( • The years (text nodes) in which the west region sales were higher than the east region sales; sales may be expressed in thousands or in millions: year[region[name="west"]/sales[@unit='millions'* 1000 or @unit='thousands'] > region[name="east"]/sales[@unit='millions‘ *1000 or @unit='thousands']]/theyear/text() XML and Java: XML, dom 4 j and Xpath – Eran Toch Methodologies in Information System Development 35

Xpath in dom 4 j • Xpath queries can be used in dom 4

Xpath in dom 4 j • Xpath queries can be used in dom 4 j: Xpath expression is fed to the xpath. Selector public void xpath(Document document) { XPath xpath. Selector = Document. Helper. create. XPath("/salesdata/year/theyear"); List results = xpath. Selector. select. Nodes(document); for (Iterator iter = results. iterator(); iter. has. Next(); ) { Element element = (Element) iter. next(); System. out. println(element. as. XML()); } } The nodes are selected from the document, according to the xpath query XML and Java: XML, dom 4 j and Xpath – Eran Toch Methodologies in Information System Development 36

Agenda • Introduction to XML – What is XML – Structure and Terminology –

Agenda • Introduction to XML – What is XML – Structure and Terminology – JAVA APIs for XML: an Overview • dom 4 j – Parsing an XML document – Writing to an XML document • Xpath – Xpath Queries – Xpath in dom 4 j • References XML and Java: XML, dom 4 j and Xpath – Eran Toch Methodologies in Information System Development 37

References - XML • XML tutorial: – http: //www. w 3 schools. com/xml/default. asp

References - XML • XML tutorial: – http: //www. w 3 schools. com/xml/default. asp • XML Specification from w 3 c: – http: //www. w 3. org/XML/ • The Java/XML Tutorial: – http: //java. sun. com/xml/tutorial_intro. html • DTD Tutorial: – http: //www. xmlfiles. com/dtd/ • XML Schema Tutorial: – http: //www. w 3 schools. com/schema/default. asp • XML Schema Resource Page: – http: //www. w 3. org/XML/Schema XML and Java: XML, dom 4 j and Xpath – Eran Toch Methodologies in Information System Development 38

dom 4 j • Web site: – http: //dom 4 j. org/ • Javadocs:

dom 4 j • Web site: – http: //dom 4 j. org/ • Javadocs: – http: //dom 4 j. org/apidocs/index. html • Quick Start: – http: //dom 4 j. org/guide. html • Cookbook (main functionality): – http: //dom 4 j. org/cookbook. html XML and Java: XML, dom 4 j and Xpath – Eran Toch Methodologies in Information System Development 39

Xpath • Xpath specification: – http: //www. w 3. org/TR/xpath • Xpath tutorial: –

Xpath • Xpath specification: – http: //www. w 3. org/TR/xpath • Xpath tutorial: – http: //www. w 3 schools. com/xpath/default. asp • Xpath tutorial (extended): – http: //www. zvon. org/xxl/XPath. Tutorial/General/exampl es. html • Xpath reference: – http: //www. vbxml. com/xsl/XPath. Ref. asp XML and Java: XML, dom 4 j and Xpath – Eran Toch Methodologies in Information System Development 40