XML for Ecommerce II Helena AhonenMyka XML processing

  • Slides: 44
Download presentation
XML for E-commerce II Helena Ahonen-Myka

XML for E-commerce II Helena Ahonen-Myka

XML processing model n n n XML processor is used to read XML documents

XML processing model n n n XML processor is used to read XML documents and provide access to their content and structure XML processor works for some application the specification defines which information the processor should provide to the application

Parsing n n n input: an XML document basic task: is the document wellformed?

Parsing n n n input: an XML document basic task: is the document wellformed? Validating parsers additionally: is the document valid?

Parsing n n parsers produce data structures, which other tools and applications can use

Parsing n n parsers produce data structures, which other tools and applications can use two kind of APIs: tree-based and eventbased

Tree-based API n n n compiles an XML document into an internal tree structure

Tree-based API n n n compiles an XML document into an internal tree structure allows an application to navigate the tree Document Object Model (DOM) is a tree -based API for XML and HTML documents

Event-based API n n n reports parsing events (such as start and end of

Event-based API n n n reports parsing events (such as start and end of elements) directly to the application through callbacks the application implements handlers to deal with the different events Simple API for XML (SAX)

Example <? xml version=” 1. 0”> <doc> <para>Hello, world!</para> </doc> n Events: start document

Example <? xml version=” 1. 0”> <doc> <para>Hello, world!</para> </doc> n Events: start document start element: doc start element: para characters: Hello, world! end element: para end element: doc

Example (cont. ) n n an application handles these events just as it would

Example (cont. ) n n an application handles these events just as it would handle events from a graphical user interface (mouse clicks, etc) as the events occur no need to cache the entire document in memory or secondary storage

Tree-based vs. event-based n n tree-based APIs are useful for a wide range of

Tree-based vs. event-based n n tree-based APIs are useful for a wide range of applications, but they may need a lot of resources (if the document is large) some applications may need to build their own tree structures, and it is very inefficient to build a parse tree only to map it to another tree

Tree-based vs. event-based n n n an event-based API is simpler, lowerlevel access to

Tree-based vs. event-based n n n an event-based API is simpler, lowerlevel access to an XML document as document is processed sequentially, one can parse documents much larger than the available system memory own data structures can be constructed using own callback event handlers

We need a parser. . . n n Apache Xerces: http: //xml. apache. org

We need a parser. . . n n Apache Xerces: http: //xml. apache. org IBM XML 4 J: http: //alphaworks. ibm. com XP: http: //www. jclark. com/xml/xp … many others

… and the SAX classes n n n http: //www. megginson. com/SAX/ often the

… and the SAX classes n n n http: //www. megginson. com/SAX/ often the SAX classes come bundled to the parser distribution some parsers only support SAX 1. 0, the latest version is 2. 0

Starting a SAX parser import org. xml. sax. XMLReader; import org. apache. xerces. parsers.

Starting a SAX parser import org. xml. sax. XMLReader; import org. apache. xerces. parsers. SAXParser; XMLReader parser = new SAXParser(); parser. parse(uri);

Content handlers n n In order to let the application do something useful with

Content handlers n n In order to let the application do something useful with XML data as it is being parsed, we must register handlers with the SAX parser handler is a set of callbacks: application code can be run at important events within a document’s parsing

Core handler interfaces in SAX n n org. xml. sax. Content. Handler org. xml.

Core handler interfaces in SAX n n org. xml. sax. Content. Handler org. xml. sax. Error. Handler org. xml. sax. DTDHandler org. xml. sax. Entity. Resolver

Custom application classes n n custom application classes that perform specific actions within the

Custom application classes n n custom application classes that perform specific actions within the parsing process can implement each of the core interfaces implementation classes can be registered with the parser with the methods set. Content. Handler(), etc.

Example: content handlers class My. Content. Handler implements Content. Handler { public void start.

Example: content handlers class My. Content. Handler implements Content. Handler { public void start. Document() throws SAXException { System. out. println(”Parsing begins…”); } public void end. Document() throws SAXException { System. out. println(”. . . Parsing ends. ”); }

Element handlers public void start. Element (String namespace. URI, String local. Name, String raw.

Element handlers public void start. Element (String namespace. URI, String local. Name, String raw. Name, Attributes atts) throws SAXexception { System. out. print(”start. Element: ” + local. Name); if (!namespace. URI. equals(””)) { System. out. println(” in namespace ” + namespace. URI + ” (” + rawname + ”)”); } else { System. out. println(” has no associated namespace”); } for (int I=0; I<atts. get. Length(); I++) { System. out. println(” Attribute: ” + atts. get. Local. Name(I) + ”=” + atts. get. Value(I)); }}

end. Element public void end. Element(String namespace. URI, String local. Name, String raw. Name)

end. Element public void end. Element(String namespace. URI, String local. Name, String raw. Name) throws SAXException { } System. out. println(”end. Element: ” + local. Name + ”n”);

Character data public void characters (char[] ch, int start, int end) throws SAXException {

Character data public void characters (char[] ch, int start, int end) throws SAXException { String s = new String(ch, start, end); System. out. println(”characters: ” + s); } n parser may return all contiguous character data at once, or split the data up into multiple method invocations

Processing instructions n n n XML documents may contain processing instructions (PIs) a processing

Processing instructions n n n XML documents may contain processing instructions (PIs) a processing instruction tells an application to perform some specific task form: <? target instructions? >

Handlers for PIs public void processing. Instruction (String target, String data) throws SAXException {

Handlers for PIs public void processing. Instruction (String target, String data) throws SAXException { System. out. println(”PI: Target: ” + target + ” and Data: ” + data); } n Application could receive instructions and set variables or execute methods to perform applicationspecific processing

Validation n some parsers are validating, some nonvalidating some parsers can do both SAX

Validation n some parsers are validating, some nonvalidating some parsers can do both SAX method to turn validation on: parser. set. Feature (”http: //xml. org/sax/features/validation”, true);

Ignorable whitespace n n n validating parser can decide which whitespace can be ignored

Ignorable whitespace n n n validating parser can decide which whitespace can be ignored for a non-validating parser, all whitespace is just characters content handler: public void ignorable. Whitespace (char[] ch, int start, int end) { … }

XML Schema n DTDs have drawbacks: – They can only define the element structure

XML Schema n DTDs have drawbacks: – They can only define the element structure and attributes – They cannot define any database-like constraints for elements: • Value (min, max, etc. ) • Type (integer, string, etc. ) – DTDs are not written in XML and cannot thus be processed with the same tools as XML documents, XSL(T), etc. n XML Schema: Schema – Is written in XML – Avoids most of the DTD drawbacks

XML Schema n XML Schema Part 1: Structures: – Element structure definition as with

XML Schema n XML Schema Part 1: Structures: – Element structure definition as with DTD: Elements, attributes, also enhanced ways to control structures n XML Schema Part 2: Datatypes: – Primitive datatypes (string, boolean, float, etc. ) – Derived datatypes from primitive datatypes (time, recurring. Date) – Constraining facets for each datatype (min. Length, max. Length, pattern, precision, etc. ) n Information about Schemas: – http: //www. w 3 c. org/XML/Schema/

Complex and simple types n n complex types: allow elements in their content and

Complex and simple types n n complex types: allow elements in their content and may have attributes simple types: cannot have element content and cannot have attributes

Reminder: DTD declarations n n <!ELEMENT name (fname+, lname)> <!ELEMENT address (name, street, (city,

Reminder: DTD declarations n n <!ELEMENT name (fname+, lname)> <!ELEMENT address (name, street, (city, state, zipcode) | (zipcode, city))> <!ELEMENT contact (address, phone*, email? )> <!ELEMENT contact 2 (address | phone | email)*>

Example: USAddress type <xsd: complex. Type name=”USAddress” > <xsd: sequence> <xsd: element name=”name” type=”xsd:

Example: USAddress type <xsd: complex. Type name=”USAddress” > <xsd: sequence> <xsd: element name=”name” type=”xsd: string” /> <xsd: element name=”street” type=”xsd: string” /> <xsd: element name=”city” type=”xsd: string” /> <xsd: element name=”state” type=”xsd: string” /> <xsd: element name=”zip” type=”xsd: decimal” /> </xsd: sequence> <xsd: attribute name=”country” type=”xsd: NMTOKEN” use=”fixed” value=”US” /> </xsd: complex. Type>

Example: Purchase. Order. Type <xsd: complex. Type name=”Purchase. Order. Type”> <xsd: sequence> <xsd: element

Example: Purchase. Order. Type <xsd: complex. Type name=”Purchase. Order. Type”> <xsd: sequence> <xsd: element name=”ship. To” type=”USAddress” /> <xsd: element name=”bill. To” type=”USAddress” /> <xsd: element ref=”comment” min. Occurs=” 0” /> <xsd: element name=”items” type=”Items” /> </xsd: sequence> <xsd: attribute name=”order. Date” type=”xsd: date” /> </xsd: complex. Type>

Notes n n n element declarations for ship. To and bill. To associate different

Notes n n n element declarations for ship. To and bill. To associate different element names with the same complex type attribute declarations must reference simple types element comment declared elsewhere in the schema (here reference only)

… continues n n element is optional, if min. Occurs = 0 maximum number

… continues n n element is optional, if min. Occurs = 0 maximum number of times an element may appear: max. Occurs attributes may appear once or not at all use attribute is used in an attribute declaration to indicate whether the attribute is required or optional, and if optional, whether the value is fixed or whethere is a default

More examples … <items> <item part. Num="872 -AA"> <product. Name>Lawnmower</product. Name> <quantity>1</quantity> <price>148. 95</price>

More examples … <items> <item part. Num="872 -AA"> <product. Name>Lawnmower</product. Name> <quantity>1</quantity> <price>148. 95</price> <comment>Confirm this is electric</comment> </item> <item part. Num="926 -AA"> <product. Name>Baby Monitor</product. Name> <quantity>1</quantity> <price>39. 98</price> <ship. Date>1999 -05 -21</ship. Date> </items> …

<xsd: complex. Type name="Items"> <xsd: element name="item" min. Occurs="0” max. Occurs="unbounded"> <xsd: complex. Type>

<xsd: complex. Type name="Items"> <xsd: element name="item" min. Occurs="0” max. Occurs="unbounded"> <xsd: complex. Type> <xsd: element name="quantity"> <xsd: simple. Type base="xsd: positive. Integer"> <xsd: max. Exclusive value="100"/> </xsd: simple. Type> </xsd: element> <xsd: element name="price" type="xsd: decimal"/> <xsd: element ref="comment" min. Occurs="0"/> <xsd: element name="ship. Date" type="xsd: date” min. Occurs="0"/> <xsd: attribute name="part. Num" type="Sku"/> </xsd: complex. Type> </xsd: element> </xsd: complex. Type> <xsd: simple. Type name=”Sku”> <xsd: pattern value="d{3}-[A-Z]{2}"/> </xsd: simple. Type>

Patterns <xsd: simple. Type name=”Sku”> <xsd: restriction base=”xsd: string”> <xsd: pattern value="d{3}-[A-Z]{2}"/> <xsd: restriction>

Patterns <xsd: simple. Type name=”Sku”> <xsd: restriction base=”xsd: string”> <xsd: pattern value="d{3}-[A-Z]{2}"/> <xsd: restriction> </xsd: simple. Type> n ”three digits followed by a hyphen followed by two uppercase ASCII letters”

Building content models n n <xsd: sequence>: fixed order <xsd: choice>: (1) choice of

Building content models n n <xsd: sequence>: fixed order <xsd: choice>: (1) choice of alternatives <xsd: group>: grouping (also named) <xsd: all>: no order specified

Null values n n A missing element may mean many things: unknown, not applicable…

Null values n n A missing element may mean many things: unknown, not applicable… an attribute to indicate that the element content is null in schema: <xsd: element name=”ship. Date” type=”xsd: date” nullable=”true” /> in document: <ship. Date xsi: null=”true”></ship. Date>

Specifying uniqueness n n XML Schema enables to indicate that any attribute or element

Specifying uniqueness n n XML Schema enables to indicate that any attribute or element value must be unique within a certain scope unique element: first ”select” a set of elements, then identify the attribute of element ”field” relative to each selected element that has to be unique within the scope of the set of selected elements

Defining keys and their references n Also keys and key references can be defined:

Defining keys and their references n Also keys and key references can be defined: <key name=”p. Num. Key”> <selector>parts/part</selector> <field>@number</field> </key> <keyref name=”dummy 2” refer=”p. Num. Key”> <selector>regions/zip/part</selector> <field>@number</field> </keyref>

XML Query Languages n Currently: – There is no recommendation/standard available, only drafts –

XML Query Languages n Currently: – There is no recommendation/standard available, only drafts – Different suggestions given in 1998, work in progress n XML Query Requirements: – Requirements draft 16. 8. 2000 – Query language until the end of 2000 n XML Query Data Model: – Draft 11. 5. 2000 n More on XML Query Languages: – http: //www. w 3. org/XML/Query/

XML Query Languages n Required features of an XML query language: – Support operations

XML Query Languages n Required features of an XML query language: – Support operations (selection, projection, aggregation, sorting, etc. ) on all data types: • Choose a part of the data based on content or structure • Also operations on hierarchy and sequence of document structures – Structural preservation and transformation: • Preserve the relative hierarchy and sequence of input document structures in the query results • Transform XML structures and create new XML structures – Combination and joining: • Combine related information from different parts of a given document or from multiple documents

XML Query Languages n Required features of an XML query language (cont'd): – Closure

XML Query Languages n Required features of an XML query language (cont'd): – Closure property: • The result of an XML document query is also an XML document (usually not valid but well-formed) • The results of a query can be used as input to another query n Notions: – HTML is layout-oriented, queries can not be efficiently carried out – XML is not layout-oriented but is based on representing structure, DTD’s and structure information can be used in queries – XML query languages are still under construction, but prototype languages exist (e. g. , XML-QL, XQL, Lore…)

XML Query Languages n We want our query to collect elements from manufacturer documents

XML Query Languages n We want our query to collect elements from manufacturer documents (in temp. database. xml) listing manufacturer's name, year, models, vendors, price, etc. to create new <car> elements – The results should list their make, model, vendor, rank, and price (in this order) n Lorel: Select xml(car: (select X. vehicle. make, X. vehicle. model, X. vehicle. vendor, X. manufacturer. rank, X. vehicle. price from temp. database. xml X))

XML Query Languages WHERE <manufacturer> <mn_name>$mn</mn_name> <vehiclemodel> <mo_name>$mon</mo_na me> <rank>$r</rank> </model> <vehicle> <price>$y</price> <vendor>$mn</vendor>

XML Query Languages WHERE <manufacturer> <mn_name>$mn</mn_name> <vehiclemodel> <mo_name>$mon</mo_na me> <rank>$r</rank> </model> <vehicle> <price>$y</price> <vendor>$mn</vendor> </vehiclemodel> </manufacturer> IN www. nhcstemp. database. xml CONSTRUCT <car> <make>$mn</make> <mo_name>$mon</mo_nam e> <vendor>$v</vendor> <rank>$r</rank> <price>$y</price> </car> n XML-QL