XML Tools Leonidas Fegaras CSE 6331 Leonidas Fegaras

  • Slides: 25
Download presentation
XML Tools Leonidas Fegaras CSE 6331 © Leonidas Fegaras XML Tools 1

XML Tools Leonidas Fegaras CSE 6331 © Leonidas Fegaras XML Tools 1

XML Processing Well-formedness checks & reference expansion XML document parser XML infoset document validator

XML Processing Well-formedness checks & reference expansion XML document parser XML infoset document validator DTD or XML schema CSE 6331 © Leonidas Fegaras XML Tools XML infoset (annotated) application storage system 2

Tools for XML Processing • DOM: a language-neutral interface for manipulating XML data –

Tools for XML Processing • DOM: a language-neutral interface for manipulating XML data – requires that the entire document be in memory • SAX: push-based stream processing – hard to write non-trivial applications • XPath: a declarative tree-navigation language – beautiful and easy to use – is part of many other languages • XSLT: a language for transforming XML based on templates – very ugly! • XQuery: full-fledged query language – influenced by OQL • Xml. Pull: pull-based stream processing – far better than SAX, but not a standard yet CSE 6331 © Leonidas Fegaras XML Tools 3

DOM The Document Object Model (DOM) is a platform- and language-neutral interface that allows

DOM The Document Object Model (DOM) is a platform- and language-neutral interface that allows programs and scripts to dynamically access and update the content and structure of XML documents. The following is part of the DOM interface: public interface Node { public String get. Node. Name (); public String get. Node. Value (); public Node. List get. Child. Nodes (); public Named. Node. Map get. Attributes (); } public interface Element extends Node { public Node get. Elements. By. Tag. Name ( String name ); } public interface Document extends Node { public Element get. Document. Element (); } public interface Node. List { public int get. Length (); public Node item ( int index ); } CSE 6331 © Leonidas Fegaras XML Tools 4

DOM Example import java. io. File; import javax. xml. parsers. *; import org. w

DOM Example import java. io. File; import javax. xml. parsers. *; import org. w 3 c. dom. *; /*[dept/text()=“cse”]/tel/text() class Test { public static void main ( String args[] ) throws Exception { Document. Builder. Factory dbf = Document. Builder. Factory. new. Instance(); Document. Builder db = dbf. new. Document. Builder(); Document doc = db. parse(new File("depts. xml")); Node. List nodes = doc. get. Document. Element(). get. Child. Nodes(); for (int i=0; i<nodes. get. Length(); i++) { Node n = nodes. item(i); Node. List ndl = n. get. Child. Nodes(); for (int k=0; k<ndl. get. Length(); k++) { Node m = ndl. item(k); if ( (m. get. Node. Name() == "dept") && (m. get. First. Child(). get. Node. Value() == "cse") ) { Node. List ncl = ((Element) m). get. Elements. By. Tag. Name("tel"); for (int j=0; j<ncl. get. Length(); j++) { Node nc = ncl. item(j); System. out. print(nc. get. First. Child(). get. Node. Value()); } } } CSE 6331 © Leonidas Fegaras XML Tools 5

Better Programming import java. io. File; import javax. xml. parsers. *; import org. w

Better Programming import java. io. File; import javax. xml. parsers. *; import org. w 3 c. dom. *; import java. util. Vector; class Sequence extends Vector { Sequence () { super(); } Sequence ( String filename ) throws Exception { super(); Document. Builder. Factory dbf = Document. Builder. Factory. new. Instance(); Document. Builder db = dbf. new. Document. Builder(); Document doc = db. parse(new File(filename)); add((Object) doc. get. Document. Element()); } Sequence child ( String tagname ) { Sequence result = new Sequence(); for (int i = 0; i<size(); i++) { Node n = (Node) element. At(i); Node. List c = n. get. Child. Nodes(); for (int k = 0; k<c. get. Length(); k++) if (c. item(k). get. Node. Name(). equals(tagname)) result. add((Object) c. item(k)); }; return result; } void print () { for (int i = 0; i<size(); i++) System. out. println(element. At(i). to. String()); } } class DOM { public static void main ( String args[] ) throws Exception { (new Sequence("cs. xml")). child("gradstudent"). child("name"). print(); } } CSE 6331 © Leonidas Fegaras XML Tools 6

SAX • SAX is a Simple API for XML that allows you to process

SAX • SAX is a Simple API for XML that allows you to process a document as it's being read – in contrast to DOM, which requires the entire document to be read before it takes any action) • The SAX API is event based – The XML parser sends events, such as the start or the end of an element, to an event handler, which processes the information CSE 6331 © Leonidas Fegaras XML Tools 7

Parser Events • Receive notification of the beginning of a document void start. Document

Parser Events • Receive notification of the beginning of a document void start. Document () • Receive notification of the end of a document void end. Document () • Receive notification of the beginning of an element void start. Element ( String namespace, String local. Name, String q. Name, Attributes atts ) • Receive notification of the end of an element void end. Element ( String namespace, String local. Name, String q. Name ) • Receive notification of character data void characters ( char[] ch, int start, int length ) CSE 6331 © Leonidas Fegaras XML Tools 8

SAX Example: a Printer import java. io. File. Reader; javax. xml. parsers. *; org.

SAX Example: a Printer import java. io. File. Reader; javax. xml. parsers. *; org. xml. sax. helpers. *; class Printer extends Default. Handler { public Printer () { super(); } public void start. Document () {} public void end. Document () { System. out. println(); } public void start. Element ( String uri, String name, String tag, Attributes atts ) { System. out. print(“<” + tag + “>”); } public void end. Element ( String uri, String name, String tag ) { System. out. print(“</”+ tag + “>”); } public void characters ( char text[], int start, int length ) { System. out. print(new String(text, start, length)); } } CSE 6331 © Leonidas Fegaras XML Tools 9

The Child Handler class Child extends Default. Handler { Default. Handler next; // the

The Child Handler class Child extends Default. Handler { Default. Handler next; // the next handler in the pipeline String ptag; // the tagname of the child boolean keep; short level; // are we keeping or skipping events? // the depth level of the current element public Child ( String s, Default. Handler n ) { super(); next = n; ptag = s; keep = false; level = 0; } public void start. Document () throws SAXException { next. start. Document(); } public void end. Document () throws SAXException { next. end. Document(); } CSE 6331 © Leonidas Fegaras XML Tools 10

The Child Handler (cont. ) public void start. Element ( String nm, String ln,

The Child Handler (cont. ) public void start. Element ( String nm, String ln, String qn, Attributes a ) throws SAXException { if (level++ == 1) keep = ptag. equals(qn); if (keep) next. start. Element(nm, ln, qn, a); } public void end. Element ( String nm, String ln, String qn ) throws SAXException { if (keep) next. end. Element(nm, ln, qn); if (--level == 1) keep = false; } public void characters ( char[] text, int start, int length ) throws SAXException { if (keep) next. characters(text, start, length); } } CSE 6331 © Leonidas Fegaras XML Tools 11

Forming the Pipeline class SAX { public static void main ( String args[] )

Forming the Pipeline class SAX { public static void main ( String args[] ) throws Exception { SAXParser. Factory pf = SAXParser. Factory. new. Instance(); SAXParser parser = pf. new. SAXParser(); Default. Handler handler = new Child("gradstudent", new Child("name", new Printer())); parser. parse(new Input. Source(new File. Reader("cs. xml")), handler); } } SAX parser CSE 6331 © Leonidas Fegaras Child: gradstudent XML Tools Child: name Printer 12

Example Input Stream SAX Events Child: gradstudent Child: name Printer SD: <department> SE: department

Example Input Stream SAX Events Child: gradstudent Child: name Printer SD: <department> SE: department <deptname> SE: deptname Computer Science C: Computer Science </deptname> EE: deptname <gradstudent> SE: gradstudent <name> SE: name <lastname> SE: lastname Smith C: Smith </lastname> EE: lastname <firstname> SE: firstname John C: John </firstname> EE: firstname </name> EE: name </gradstudent> EE: gradstudent . . . </department> EE: department CSE 6331 ED: © Leonidas Fegaras XML Tools 13

Xml. Pull Unlike SAX, you pull events from document • Create a pull parser:

Xml. Pull Unlike SAX, you pull events from document • Create a pull parser: Xml. Pull. Parser xpp; xpp = factory. new. Pull. Parser(); • Pull the next event: xpp. get. Event. Type() • Type of events: – – – START_TAG END_TAG TEXT START_DOCUMENT END_DOCUMENT • More information at: http: //www. xmlpull. org/ CSE 6331 © Leonidas Fegaras XML Tools 14

Better Xml. Pull Events class Attributes { public String[] names; public String[] values; }

Better Xml. Pull Events class Attributes { public String[] names; public String[] values; } abstract class Event { } class Start. Tag extends Event { public String tag; public Attributes attributes; } class End. Tag extends Event { public String tag; } class CData extends Event { public String text; } class EOS extends Event {} CSE 6331 © Leonidas Fegaras XML Tools 15

Iterators import org. xmlpull. v 1. Xml. Pull. Parser; import org. xmlpull. v 1.

Iterators import org. xmlpull. v 1. Xml. Pull. Parser; import org. xmlpull. v 1. Xml. Pull. Parser. Factory; abstract class Iterator { abstract public void open (); // open the stream iterator abstract public void close (); // close the stream iterator abstract public Event next (); // get the next tuple from stream } abstract class Filter extends Iterator { Iterator input; } CSE 6331 © Leonidas Fegaras XML Tools 16

Document Reader class Document extends Iterator { String path; int state; File. Reader reader;

Document Reader class Document extends Iterator { String path; int state; File. Reader reader; Xml. Pull. Parser xpp; static Xml. Pull. Parser. Factory factory; Event get. Event () { int event. Type = xpp. get. Event. Type(); if (event. Type == Xml. Pull. Parser. START_TAG) { int len = xpp. get. Attribute. Count(); String[] names = new String[len]; String[] values = new String[len]; for (int i = 0; i<len; i++) { names[i] = xpp. get. Attribute. Name(i); values[i] = xpp. get. Attribute. Value(i); }; return new Start. Tag(xpp. get. Name(), new Attributes(names, values)); } else if (event. Type == Xml. Pull. Parser. END_TAG) return new End. Tag(xpp. get. Name()); else if (event. Type == Xml. Pull. Parser. TEXT) { int[] v = new int[2]; char[] ch = xpp. get. Text. Characters(v); return new CData(new String(ch, v[0], v[1])); }} CSE 6331 © Leonidas Fegaras XML Tools 17

Document Reader (cont. ) public void open () { reader = new File. Reader(path);

Document Reader (cont. ) public void open () { reader = new File. Reader(path); xpp = factory. new. Pull. Parser(); xpp. set. Input(reader); state = 0; } public void close () { reader. close(); } public Event next () { if (state > 0) { state++; if (state == 2) return new EOS(); }; Event e = get. Event(); if (xpp. get. Event. Type() != Xml. Pull. Parser. END_DOCUMENT) xpp. next(); return e; } CSE 6331 © Leonidas Fegaras XML Tools 18

The Child Iterator class Child extends Filter { String tag; short nest; // the

The Child Iterator class Child extends Filter { String tag; short nest; // the nesting level of the event boolean keep; // are we in keeping mode? public void open () { keep = false; nest = 0; input. open(); } public Event next () { while (true) { Event t = input. next(); if (t instanceof EOS) return t; else if (t instanceof Start. Tag) { if (nest++ == 1) { keep = tag. equals(((Start. Tag) t). tag); if (!keep) continue; } } else if (t instanceof End. Tag) if (--nest == 1 && keep) { keep = false; return t; }; if (keep) return t; } } } CSE 6331 © Leonidas Fegaras XML Tools 19

XSL Transformation A stylesheet specification language for converting XML documents into various forms (XML,

XSL Transformation A stylesheet specification language for converting XML documents into various forms (XML, HTML, plain text, etc). • Can transform each XML element into another element, add new elements into the output file, or remove elements. • Can rearrange and sort elements, test and make decisions about which elements to display, and much more. • Based on XPath: <xsl: stylesheet version=’ 1. 0’ xmlns: xsl=’http//www. w 3. org/1999/XSL/Transform’> <students> <xsl: copy-of select=”//student/name”/> </students> </xsl: stylesheet> CSE 6331 © Leonidas Fegaras XML Tools 20

XSLT Templates • XSL uses XPath to define parts of the source document that

XSLT Templates • XSL uses XPath to define parts of the source document that match one or more predefined templates. • When a match is found, XSLT will transform the matching part of the source document into the result document. • The parts of the source document that do not match a template will end up unmodified in the result document (they will use the default templates). Form: <xsl: template match=”XPath expression”> … </xsl: template> The default (implicit) templates visit all nodes and strip out all tags: <xsl: template match=”*|/”> <xsl: apply-templates/> </xsl: template> <xsl: template match=“text()|@*"> <xsl: value-of select=“. ”/> </xsl: template> CSE 6331 © Leonidas Fegaras XML Tools 21

Other XSLT Elements <xsl: value-of select=“XPath expression“/> select the value of an XML element

Other XSLT Elements <xsl: value-of select=“XPath expression“/> select the value of an XML element and add it to the output stream of the transformation, e. g. <xsl: value-of select="//books/book/author"/>. <xsl: copy-of select=“XPath expression“/> copy the entire XML element to the output stream of the transformation. <xsl: apply-templates match=“XPath expression“/> apply the template rules to the elements that match the XPath expression. <xsl: element name=“XPath expression“> … </xsl: element> add an element to the output with a tag-name derived from the XPath. Example: <xsl: stylesheet version = ’ 1. 0’ xmlns: xsl=’http: //www. w 3. org/1999/XSL/Transform’> <xsl: template match="employee"> <b> <xsl: apply-templates select="node()"/> </b> </xsl: template> <xsl: template match="surname"> <i> <xsl: value-of select=". "/> </i> </xsl: template> </xsl: stylesheet> CSE 6331 © Leonidas Fegaras XML Tools 22

Copy the Entire Document <xsl: stylesheet version = ’ 1. 0’ xmlns: xsl=’http: //www.

Copy the Entire Document <xsl: stylesheet version = ’ 1. 0’ xmlns: xsl=’http: //www. w 3. org/1999/XSL/Transform’> <xsl: template match=“/"> <xsl: apply-templates/> </xsl: template> <xsl: template match=“text()"> <xsl: value-of select=“. ”/> </xsl: template> <xsl: template match=“*"> <xsl: element name=“name(. )”> <xsl: apply-templates/> </xsl: element> </xsl: template> </xsl: stylesheet> CSE 6331 © Leonidas Fegaras XML Tools 23

More on XSLT • Conflict resolution: more specific templates overwrite more general templates. Templates

More on XSLT • Conflict resolution: more specific templates overwrite more general templates. Templates are assigned default priorities, but they can be overwritten using priority=“n” in a template. • Modes can be used to group together templates. No mode is an empty mode. <xsl: template match=“…” mode=“A”> <xsl: apply-templates mode=“B”/> </xsl: template> • Conditional and loop statements: <xsl: if test=“XPath predicate”> body </xsl: if> <xsl: for-each select=“XPath”> body </xsl: for-each> • Variables can be used to name data: <xsl: variable name=“x”> value </xsl: variable> Variables are used as CSE 6331 © Leonidas Fegaras XML Tools {$x} in XPaths. 24

Using XSLT import import javax. xml. parsers. *; org. xml. sax. *; org. w

Using XSLT import import javax. xml. parsers. *; org. xml. sax. *; org. w 3 c. dom. *; javax. xml. transform. *; javax. xml. . transform. dom. *; javax. xml. transformstream. *; java. io. *; class XSLT { public static void main ( String argv[] ) throws Exception { File stylesheet = new File("x. xsl"); File xmlfile = new File("a. xml"); Document. Builder. Factory dbf = Document. Builder. Factory. new. Instance(); Document. Builder db = dbf. new. Document. Builder(); Document document = db. parse(xmlfile); Stream. Source stylesource = new Stream. Source(stylesheet); Transformer. Factory tf = Transformer. Factory. new. Instance(); Transformer transformer = tf. new. Transformer(stylesource); DOMSource source = new DOMSource(document); Stream. Result result = new Stream. Result(System. out); transformer. transform(source, result); } } CSE 6331 © Leonidas Fegaras XML Tools 25