COMPARING JAVA XML PARSERS PRESENTED BY SASANKA SEKHAR
COMPARING JAVA XML PARSERS PRESENTED BY SASANKA SEKHAR BANERJEE
COMPARING JAVA XML PARSERS During this presentation, we will discuss the following: ü ü ü ü ü Need for XML Brief overview of XML Different methods of parsing XML DOM [Document Object Model] SAX [Simple API for XML] JAXP [Java API for XML processing] JAXB [Java API for XML Binding] St. AX [Streaming API for XML] XPath Choose the right parser
COMPARING JAVA XML PARSERS – NEED FOR XML Ø Applications essentially consist of two parts - functionality described by the code and the data that is manipulated by the code. Ø The in-memory storage and management of data is a key part of any programming language and environment. Ø Within a single application, the programmer is free to decide how the data is stored and represented. Ø Problem - Application must exchange data with another application. Ø Can use an intermediary storage medium, such as a database. Ø But what if the data is to be exchanged directly between two applications, or the applications cannot access the same database? Ø In this case, the data must be encoded in some particular format as it is produced. Ø This has often resulted in the creation of application-specific data formats. Ø These formats can be text-based, such as HTML for encoding how to display the encapsulated data, or binary, such as those used for sending remote procedure calls. Ø Problem - In either case, there tends to be a lack of flexibility in the data representation, causing problems when versions change or when data needs to be exchanged between disparate applications, frequently from different vendors.
COMPARING JAVA XML PARSERS –XML USAGE ØXML was developed to address these issues. XML is written in plain text, uses self-describing elements and provides a data encoding format that is: ü Generic ü Simple ü Flexible ü Extensible ü Portable ØXML offers a method of putting structured data in a text file. Structured data conforms to a particular format; examples are spreadsheets, address books, configuration parameters, and financial transactions. ØThis plain text data provides software- and hardware-independent way of storing data making it easier to create data that different applications can share. ØExchanging data as XML greatly reduces this complexity, since the data can be read by different incompatible applications. ØWhile upgrading to a new systems large volume of data must be converted and incompatible data is often lost. XML plain text format. This makes it easier to expand or upgrade to new systems, without losing data. ØWith XML, data can be available to all kinds of "reading machines" (Handheld computers, voice machines, news feeds, etc)
COMPARING JAVA XML PARSERS – OVERVIEW OF XML Ø XML document consists of elements, each element has a start tag, content and an end tag. Ø XML document must have exactly one root element, e. g. one tag which encloses the remaining tags. Ø XML document is case-sensitive and required to be well-formatted. Ø Following conditions need to satisfied in order to be well-formatted: ü A XML document always starts with a prolog ü Every tag has a closing tag. ü All tags are completely nested. Ø XML document is valid if it is well-formatted and if it is contains a link to a XML schema and is valid according to the schema. Ø The following is a valid, well-formatted XML file <? xml version="1. 0"? > <!-- This is a comment --> <address> <name>Lars </name> <street> Test </street> <telephone number= "0123"/> </address>
COMPARING JAVA XML PARSERS – PARSING XML Java contains several methods to access XML. The following is a short overview of the available methods. Document Object Model or DOM ØDefines a mechanism for accessing and manipulating well-formed XML. ØUsing the DOM API, the XML document is transformed into a tree structure in memory. ØThe application then navigates the tree to parse the document. ØIf the document is large, it can place a strain on system resources. Simple API For XML or SAX ØDefines XML parsing methods. ØEvent based parser, the SAX parser streams a series of events while it reads the document. ØThese events are forwarded to event handlers, which also provide access to the data of the document. ØConsumes extremely low memory, XML is not required to be loaded into the memory at one time. ØNeed to implement all the event handlers to handle each and every incoming event. ØIncapable of processing the events when it comes to the DOM's element supports, and need to keep track of the parsers position in the document hierarchy. ØThe application logic gets tougher as the document gets complicated and bigger. ØIt may not be required that the entire document be loaded but a SAX parser still requires to parse the whole document, similar to the DOM. ØIt lacks a built-in document support for navigation like the one which is provided by XPath. ØAlong with the existing problem the one-pass parsing syndrome also limits the random access support.
COMPARING JAVA XML PARSERS – PARSING XML Java API for XML Processing or JAXP ØIt provides a common interface for creating and using SAX and DOM in Java. ØIt does not implement a parser in itself, but defines the behavior that a parser is (at least) to support. ØThe actual parser itself will have to derive these classes and provide concrete classes. ØIt uses FACTORY pattern to create a concrete class and then call methods on these to parse. ØDocument. Builder. Factory class is used for DOM Parsing and SAXParser. Factory is used for SAX parsing. Traversing the DOM using JAXP: ØInstantiate a factory class. ØUsing the factory class instantiate the provider class. ØUsing the provider class created in the previous step perform the XML processing/parsing Document. Builder. Factoty factory. Builder Document. Builder builder Document doc = Document. Builder. Factory. new. Instance( ); = factory. Builder. new. Document. Builder(); = builder. parse( file. Name );
COMPARING JAVA XML PARSERS – PARSING XML SAX Parsing using JAXP In the case of DOM parser, responsibility was passed to the actual parser to parse the XML document and return the DOM document object. But for SAX, the approach is quite opposite. We call the parse method and pass a handler object – this handler will receive notifications about the parsing progress, errors encountered and so on. SAXParser. Factory factory. SAX = SAXParser. Factory. new. Instance(); SAXParser sax = factory. SAX. new. SAXParser(); Default. Handler handler = new XMLParser(); sax. parse(input. Stream, handler); The only major difference is the parse function – first, the parse function doesn’t return a Document object and, secondly, we need to specify a Default. Handler-derived class. The handler class is meant to build up the DOM internally, should it need to.
COMPARING JAVA XML PARSERS – PARSING XML Java API For XML Binding or JAXB ØDOM is a useful API that build and transform XML documents in memory. Unfortunately, DOM is somewhat slow and resource hungry. To address these problems, the Java Architecture for XML Binding (JAXB) has been developed. ØJAXB provides a mechanism that simplifies the creation and maintenance of XML-enabled Java applications. It does this by using an XML schema compiler (only DTDs and a subset of XML schemas and namespaces at the time of this writing) that translates XML DTDs into one or more Java classes, thereby removing the burden from the developer to write complex parsing code. ØThe generated classes handle all the details of XML parsing and formatting, including code to perform error and validity checking of incoming and outgoing XML documents, which ensures that only valid, error -free XML is accepted. ØBecause the code has been generated for a specific schema, the generated classes are more efficient than those in a generic SAX or DOM parser. Most important, a JAXB parser often requires a much smaller footprint in memory than a generic parser. ØClasses created with JAXB do not include tree-manipulation capability, which is one factor that contributes to the small memory footprint of a JAXB object tree.
COMPARING JAVA XML PARSERS – PARSING XML JAXB primarily contains at the two main components: Ø The binding compiler, which binds a given XML schema to a set of generated Java classes Ø The binding runtime framework, which provides unmarshalling, and validation functionalities. Unmarshalling a XML document Unmarshalling is the process of converting an XML document into a corresponding set of Java objects. First step is to create a JAXBContext context object which is the starting point for marshalling, unmarshalling, and validation operations. JAXBContext jaxb. Context = JAXBContext. new. Instance (“com. xmlparsers. jaxb. xsd. marketerprofile"); To unmarshall an XML document, create an Unmarshaller from the context: Unmarshaller unmarshaller = jaxb. Context. create. Unmarshaller(); The unmarshaller returns the unmarshalled object: Create. Customer. Profile. Response profile. Element = (Create. Customer. Profile. Response) unmarshaller. unmarshal(new File("src/com/xmlparsers/jaxb/xsd/CIMMarketer. Profile. xml")); String marketer. Profile = profile. Element. get. Customer. Profile. Id();
COMPARING JAVA XML PARSERS – PARSING XML Marshalling a XML document Marshalling involves transforming Java classes into XML format. Message. Type msg. Type = new Message. Type(); msg. Type. set. Code("0"); msg. Type. set. Text("Successfull"); Messages. Type msg. Types = new Messages. Type(); msg. Types. set. Result. Code("OK"); msg. Types. get. Message. Type(). add(msg. Type); Create. Customer. Profile. Response marketer. Profile = new Create. Customer. Profile. Response(); marketer. Profile. get. Messages. Type(). add(msg. Types); marketer. Profile. set. Customer. Profile. Id("21345678"); JAXBContext context = JAXBContext. new. Instance(Create. Customer. Profile. Response. class); Marshaller m = context. create. Marshaller(); m. set. Property(Marshaller. JAXB_FORMATTED_OUTPUT, Boolean. TRUE); m. marshal(marketer. Profile, System. out);
COMPARING JAVA XML PARSERS – PARSING XML Use JAXB when you want to ü Access data in memory, but do not need tree manipulation capabilities ü Process only data that is valid ü Convert data to different types ü Generate classes based on a DTD or XML schema ü Build object representations of XML data Use JAXP when you want to ü Have flexibility with regard to the way you access the data, either serially with SAX or randomly in memory with DOM ü Use your same processing code with documents based on different DTDs ü Parse documents that are not necessarily valid ü Apply XSLT transformations ü Insert or remove components from an in-memory XML tree
COMPARING JAVA XML PARSERS – PARSING XML Streaming API For XML or St. AX Traditionally, XML APIs are either: ü Tree based - the entire document is read into memory as a tree structure for random access by the calling application ü Event based - the application registers to receive events as entities are encountered within the source document. Tree based API are less efficient with respect to the memory usage. In such situations, a streaming API is preferred which uses much less memory since it doesn't have to hold the entire document in memory simultaneously. It can process the document in small pieces making it much faster. SAX is one such event based streaming API which actually ‘pushes’ data into the application. They feed the content of the document to the application as soon as they see it, whether the application is ready to receive that data or not. St. AX was designed as a median between these two opposites. The programmatic entry point is a cursor that represents a point within the document. The application moves the cursor forward - 'pulling' the information from the parser as it needs.
COMPARING JAVA XML PARSERS – PARSING XML Pull API has the following advantages: ü Pull APIs are a more comfortable alternative for streaming processing of XML. ü A Pull API is based around the more familiar Iterator design pattern rather than the less well-known observer design pattern. ü In a Pull API, the client program asks the parser for the next piece of information rather than the parser telling the client program when the next datum is available. ü In a Pull API the client program drives the parser whereas in a Push API the parser drives the client. Why St. AX ? ü St. AX shares with SAX the ability to read arbitrarily large documents. ü However, in St. AX the application is in control rather than the parser. ü The application tells the parser when it wants to receive the next data chunk rather than the parser ü St. AX exceeds SAX by allowing programs to both read existing XML documents and create new ones. ü Unlike SAX, St. AX is a bidirectional API.
COMPARING JAVA XML PARSERS – PARSING XML Reading XML with St. AX: ü XMLStream. Reader is the key interface in St. AX. ü This interface represents a cursor that's moved across an XML document from beginning to end. ü At any given time, this cursor points at one event: text node, start-tag, comment, etc. ü The cursor always moves forward, never backward, and normally only moves one item at a time. ü Methods like get. Name and get. Text can be invoked to retrieve information. ü A typical St. AX program begins by using the XMLInput. Factory class to load an implementation dependent instance of XMLStream. Reader. Input. Stream in = new File. Input. Stream(new File("src/com/xmlparsers/jaxb/xsd/CIMMarketer. Profile. xml")); XMLInput. Factory factory = XMLInput. Factory. new. Instance(); XMLStream. Reader stax. Parser = factory. create. XMLStream. Reader(in);
COMPARING JAVA XML PARSERS – PARSING XML while (stax. Parser. has. Next()) { int event = stax. Parser. next(); if (event == XMLStream. Constants. END_DOCUMENT) { stax. Parser. close(); break; } if (event == XMLStream. Constants. START_ELEMENT) { System. out. println(stax. Parser. get. Local. Name()); } } The advantage of St. AX parsing over SAX parsing is that a parse event may be skipped by invoking the next() method as shown in the following code. For example, if the parse event is of type START_ELEMENT, a developer may determine if the event information is to be obtained or the next event is to be retrieved: if (event == XMLStream. Constants. START_ELEMENT) { System. out. println(stax. Parser. get. Local. Name()); }
COMPARING JAVA XML PARSERS – PARSING XML Writing with St. AX // XMLStream. Writer will be obtained from an XMLOutput. Factory output. Factory= XMLOutput. Factory. new. Instance(); XMLStream. Writer= output. Factory. create. XMLStream. Writer(System. out); // create a document start with the write. Start. Document() method XMLStream. Writer. write. Start. Document("UTF-8", "1. 0"); XMLStream. Writer. write. Comment("Testing with St. AX "); // Output the start of the 'catalog' element using write. Start. Element() method XMLStream. Writer. write. Start. Element("create. Customer. Profile. Response"); XMLStream. Writer. write. Namespace("xsi", "http: //www. w 3. org/2001/XMLSchema-instance"); XMLStream. Writer. write. Start. Element("messages"); XMLStream. Writer. write. Start. Element("result. Code"); XMLStream. Writer. write. Characters("Ok"); XMLStream. Writer. write. End. Element();
COMPARING JAVA XML PARSERS – PARSING XML Writing with St. AX …. contd XMLStream. Writer. write. Start. Element("message"); XMLStream. Writer. write. Start. Element("code"); XMLStream. Writer. write. Characters("I 00001"); XMLStream. Writer. write. End. Element(); XMLStream. Writer. write. Start. Element("text"); XMLStream. Writer. write. Characters("Successful"); XMLStream. Writer. write. End. Element(); XMLStream. Writer. write. Start. Element("customer. Profile. Id"); XMLStream. Writer. write. Characters("1103042"); XMLStream. Writer. write. End. Element(); XMLStream. Writer. flush(); XMLStream. Writer. close();
COMPARING JAVA XML PARSERS – PARSING XML XPATH ü XPath is a language for addressing parts of an XML document. ü XPath, XML Path Language, is an expression language for addressing portions of an XML document or navigating within an XML document. ü XPath is really helpful for parsing XML- based configuration or properties files. ü XPath uses path expressions to select nodes or node-sets in an XML document. ü These path expressions look very much like URL and traditional file system paths. ü XPath also supports several functions for string manipulation, comparison and others. ü XML documents are treated as trees of nodes and the root is called the document or root node. ü There about seven different kinds of nodes. ü They are element, attribute, text, namespace, processing-instruction, comment, and root nodes.
COMPARING JAVA XML PARSERS – PARSING XML XPATH Let us consider the following XML sample: <create. Customer. Profile. Response xmlns: xsi="http: //www. w 3. org/2001/XMLSchema-instance"> <messages> <result. Code>Ok</result. Code> <message> <code>I 00001</code> <text>Successful. </text> </messages> <customer. Profile. Id>1103042</customer. Profile. Id> </create. Customer. Profile. Response> The root node is < create. Customer. Profile. Response>. <messages> and <customer. Profile. Id> are the two Elements. The <result. Code> node is a child of the <messages> element. The result. Code value ‘Ok’ is a text node.
COMPARING JAVA XML PARSERS – PARSING XML XPATH – Path Expression syntax Expression Description nodename Selects all child nodes of the named node / Selects from root node // Selects nodes from the current node that match the selection no matter where they are . Selects the current node . . Selects the parent of the current node @ Selects attributes * Matches any element node @* Matches any attribute nodes node() Matches any node of any kind
COMPARING JAVA XML PARSERS – PARSING XML XPATH – Reading XML Input. Stream result. Stream = new File. Input. Stream(new File("src/com/xmlparsers/jaxb/xsd/CIMMarketer. Profile. xml")); java. io. Buffered. Reader a. Reader = new java. io. Buffered. Reader(new java. io. Input. Stream. Reader(result. Stream, "UTF 8")); String. Buffer a. Response = new String. Buffer(); String a. Line = a. Reader. read. Line(); while(a. Line != null) { a. Response. append(a. Line); a. Line = a. Reader. read. Line(); } result. Stream. close(); if (a. Response. length() > 0 && (int) a. Response. char. At(0) == 0 x. FEFF) { a. Response. delete. Char. At(0); }
COMPARING JAVA XML PARSERS – PARSING XML XPATH – Reading XML javax. xml. parsers. Document. Builder doc. Builder = javax. xml. parsers. Document. Builder. Factory. new. Instance(). new. Document. Builder(); java. io. String. Reader string. Reader = new java. io. String. Reader(a. Response. to. String()); org. w 3 c. dom. Document doc = doc. Builder. parse(new org. xml. sax. Input. Source(string. Reader)); javax. xml. xpath. XPath xpath = javax. xml. xpath. XPath. Factory. new. Instance(). new. XPath(); String customer. Profile. Id = xpath. evaluate("/*/customer. Profile. Id/text()", doc);
- Slides: 23