Data Interchange XML Henning Schulzrinne Advanced Programming with

  • Slides: 39
Download presentation
Data Interchange & XML Henning Schulzrinne Advanced Programming (with material by Suhit Gupta) Advanced

Data Interchange & XML Henning Schulzrinne Advanced Programming (with material by Suhit Gupta) Advanced Programming Spring 2002

Data interchange § Unix-style files § serialization (marshalling): convert structured data into linear stream

Data interchange § Unix-style files § serialization (marshalling): convert structured data into linear stream of bytes § for files and across networks – used to be separate § part of RPC: § Sun RPC § Corba § Java § ASN. 1 § XML Advanced Programming Spring 2002 2

Data interchange § Work across different platforms § byte order, floating point, character sets,

Data interchange § Work across different platforms § byte order, floating point, character sets, . . . § convert to destination platform § common intermediate representation § Efficient Advanced Programming Spring 2002 3

Structured files § Older OS had records and punch card columns § Unix model:

Structured files § Older OS had records and punch card columns § Unix model: lines separated into columns (tabs, spaces, commas) § Also: csv in Excel and kin § Sometimes # for comments Advanced Programming Spring 2002 4

Structured files § Examples: § /etc/passwd hgs: 8 D 6 uxb. jefyxz: 5815: 92:

Structured files § Examples: § /etc/passwd hgs: 8 D 6 uxb. jefyxz: 5815: 92: Henning G. Schulzrinne: /home/hgs: /bin/tcsh § files for sort § /etc/services time rlp location nameserver 37/tcp 37/udp 39/tcp timeserver resource # resource 39/udp resource # resource 42/tcp 42/udp name # IEN 116 Advanced Programming Spring 2002 5

XML § IBM: SGML (structured general markup language) HTML (hypertext mark-up) XML XHTML §

XML § IBM: SGML (structured general markup language) HTML (hypertext mark-up) XML XHTML § idea: label content instead of presentation § subset of SGML § “documents”, but really structured (tree) data objects § human readable, but not necessarily terse Advanced Programming Spring 2002 6

XML § entities = storage units containing § parsed data: characters + markup §

XML § entities = storage units containing § parsed data: characters + markup § unparsed data § markup = start tags, end tags, entity references, . . . § starts with document type declaration: <? xml version="1. 0"? > <!DOCTYPE greeting SYSTEM "hello. dtd"> § comments: <!– comment --> § verbatim: <![CDATA[<greeting>Hello, world!</greeting>]]> Advanced Programming Spring 2002 7

XML § Document type definition (DTD) defines structure of XML <? xml version="1. 0"

XML § Document type definition (DTD) defines structure of XML <? xml version="1. 0" encoding="UTF-8" ? > <!DOCTYPE greeting [ <!ELEMENT greeting (#PCDATA)> <!ATTLIST poem xml: space (default|preserve) 'preserve'> ]> <greeting>Hello, world!</greeting> § Other mechanisms: XSD (later) Advanced Programming Spring 2002 8

Tags § Tags and attributes organized into XML name spaces § e. g. ,

Tags § Tags and attributes organized into XML name spaces § e. g. , language attribute: <p xml: lang="en">The quick brown over fox. . . </p> <p xml: lang="en-GB">What colour is it? </p> <p xml: lang="en-US">What color is it? </p> <sp who="Faust“ desc='leise' xml: lang="de"> <l>Habe nun, ach! Philosophie, </l> <l>Juristerei, und Medizin</l> <l>und leider auch Theologie</l> <l>durchaus studiert mit heißem Bemüh'n. </l> </sp> Advanced Programming Spring 2002 9

XML – special characters and binary data § &# and &#x introduce ISO 10646

XML – special characters and binary data § &# and &#x introduce ISO 10646 characters, e. g. , &#x 3 C; for < § binary data is more painful: § base 64 encoding (6 bits per character) § hello a. GVsb. G 8 K § MIME multipart § external reference Advanced Programming Spring 2002 10

XML binary: MIME Content-Type: multipart/related; boundary=--xxxxx; --xxxxx Content-Type: text/xml Content-ID: Contents <? xml version="1.

XML binary: MIME Content-Type: multipart/related; boundary=--xxxxx; --xxxxx Content-Type: text/xml Content-ID: Contents <? xml version="1. 0" ? > <object. Def uid="? "> <property><name>Width</name> <value><i 4>1024</i 4></value></property> <property><name>Height</name> <value><i 4>1024</i 4></value></property> <property><name>Pixels</name> <value><stream href=cid: Pixels /></value></property> --xxxxx Content-Type: application/binary Content-Transfer-Encoding: Little-Endian Content-ID: Pixels Content-Length: 524288. . binary data here. . . --xxxxx Advanced Programming Spring 2002 11

XML schema definition (XSD) § Define semantic structure of documents, with typing ( DTD)

XML schema definition (XSD) § Define semantic structure of documents, with typing ( DTD) <? xml version="1. 0"? > <purchase. Order order. Date="1999 -10 -20"> <ship. To country="US"> <name>Alice Smith</name><street>123 Maple Street</street> <city>Mill Valley</city><state>CA</state><zip>90952</zip> </ship. To> <bill. To country="US"> <name>Robert Smith</name> <street>8 Oak Avenue</street> <city>Old Town</city> <state>PA</state> <zip>95819</zip> </bill. To> <comment>Hurry, my lawn is going wild!</comment> <items> <item part. Num="872 -AA"> <product. Name>Lawnmower</product. Name> <quantity>1</quantity> <USPrice>148. 95</USPrice> <comment>Confirm this is electric</comment> </item><item part. Num="926 -AA"> <product. Name>Baby Monitor</product. Name> <quantity>1</quantity> <USPrice>39. 98</USPrice> <ship. Date>1999 -05 -21</ship. Date> </items> </purchase. Order> Advanced Programming Spring 2002 12

XML schema § complex types contain other elements § simple types contain numbers, strings,

XML schema § complex types contain other elements § simple types contain numbers, strings, dates, . . . but no subelements § built-in simple types: string, token, byte, integer, float, double, boolean, time, date. Time, duration, any. URI, language, NMTOKEN, . . . Advanced Programming Spring 2002 13

XML schema example <xsd: schema xmlns: xsd="http: //www. w 3. org/2001/XMLSchema"> <xsd: annotation> <xsd:

XML schema example <xsd: schema xmlns: xsd="http: //www. w 3. org/2001/XMLSchema"> <xsd: annotation> <xsd: documentation xml: lang="en"> Purchase order schema for Example. com. Copyright 2000 Example. com. All rights reserved. </xsd: documentation> </xsd: annotation> <xsd: element name="purchase. Order" type="Purchase. Order. Type"/> <xsd: element name="comment" type="xsd: string"/> <xsd: complex. Type name="Purchase. Order. Type"> <xsd: sequence> <xsd: element name="ship. To" type="USAddress"/> <xsd: element name="bill. To" type="USAddress"/> <xsd: element ref="comment" min. Occurs="0"/> <xsd: element name="items" type="Items"/> </xsd: sequence> <xsd: attribute name="order. Date" type="xsd: date"/> </xsd: complex. Type> <xsd: complex. Type name="USAddress"> <xsd: sequence> <xsd: element name="name" type="xsd: string"/> <xsd: element name="street" type="xsd: string"/> <xsd: element name="city" type="xsd: string"/> <xsd: element name="state" type="xsd: string"/> <xsd: element name="zip" type="xsd: decimal"/> </xsd: sequence> <xsd: attribute name="country" type="xsd: NMTOKEN" fixed="US"/> </xsd: complex. Type> Advanced Programming Spring 2002 14

XML schema, cont’d. <xsd: complex. Type name="Items"> <xsd: sequence> <xsd: element name="item" min. Occurs="0"

XML schema, cont’d. <xsd: complex. Type name="Items"> <xsd: sequence> <xsd: element name="item" min. Occurs="0" max. Occurs="unbounded"> <xsd: complex. Type> <xsd: sequence> <xsd: element name="product. Name" type="xsd: string"/> <xsd: element name="quantity"> <xsd: simple. Type> <xsd: restriction base="xsd: positive. Integer"> <xsd: max. Exclusive value="100"/> </xsd: restriction> </xsd: simple. Type> </xsd: element> <xsd: element name="USPrice" type="xsd: decimal"/> <xsd: element ref="comment" min. Occurs="0"/> <xsd: element name="ship. Date" type="xsd: date" min. Occurs="0"/> </xsd: sequence> <xsd: attribute name="part. Num" type="SKU" use="required"/> </xsd: complex. Type> </xsd: element> </xsd: sequence> </xsd: complex. Type> Advanced Programming Spring 2002 15

XML – values § <xsd: simple. Type name="My. Integer" base="xsd: integer"> <xsd: min. Inclusive

XML – values § <xsd: simple. Type name="My. Integer" base="xsd: integer"> <xsd: min. Inclusive value="1"/> <xsd: max. Inclusive value="99"/> </xsd: simple. Type> § <xsd: simple. Type name="Sku" base="xsd: string"> <xsd: pattern value="d{3}-[A-Z]{2}"/> </xsd: simple. Type> Advanced Programming Spring 2002 16

XML – min. Occurs/max. Occurs Adding Attributes to the Inline Type Definition <xsd: element

XML – min. Occurs/max. Occurs Adding Attributes to the Inline Type Definition <xsd: element name="Item" min. Occurs="0" max. Occurs="unbounded"> <xsd: complex. Type> <xsd: element name="product. Name" type="xsd: string"/> <xsd: element name="quantity"> <xsd: simple. Type base="xsd: positive. Integer"> <xsd: max. Exclusive value="100"/> </xsd: simple. Type> </xsd: element> <xsd: element name="price" type="xsd: decimal"/> <xsd: element ref="comment" min. Occurs="0"/> <xsd: element name="ship. Date" type="xsd: date" min. Occurs='0'/> <xsd: attribute name="part. Num" type="Sku"/> <xsd: attribute name="weight" type="xsd: decimal"/> <xsd: attribute name="ship. By"> <xsd: simple. Type base="string"> <xsd: enumeration value="air"/> <xsd: enumeration value="land"/> <xsd: enumeration value="any"/> </xsd: simple. Type> </xsd: attribute> </xsd: complex. Type> Advanced Programming </xsd: element> Spring 2002 17

XML – max/min § You can specify max and min values § By combining

XML – max/min § You can specify max and min values § By combining and nesting the various groups provided by XML Schema, and by setting the values of min. Occurs and max. Occurs, it is possible to represent any content model expressible with an XML. Advanced Programming Spring 2002 18

XML – Attribute groups Adding Attributes Using an Attribute Group <xsd: element name="item" min.

XML – Attribute groups Adding Attributes Using an Attribute Group <xsd: element name="item" min. Occurs="0" max. Occurs="unbounded"> <xsd: complex. Type> <xsd: element name="product. Name" type="xsd: string"/> <xsd: element name="quantity"> <xsd: simple. Type base="xsd: positive. Integer">. . <xsd: attribute. Group name="Item. Delivery"> <xsd: attribute name="part. Num" type="Sku"/> <xsd: attribute name="weight" type="xsd: decimal"/> <xsd: attribute name="ship. By"> <xsd: simple. Type base="xsd: string"> <xsd: enumeration value="air"/> <xsd: enumeration value="land"/> <xsd: enumeration value="any"/> </xsd: simple. Type> </xsd: attribute. Group> Advanced Programming Spring 2002 19

XML attribute groups § Using an attribute group in this way can improve the

XML attribute groups § Using an attribute group in this way can improve the readability of schema, and facilitates updating schema because an attribute group can be defined and edited in one place and referenced in multiple definitions and declarations. § These characteristics of attribute groups make them similar to parameter entities in XML Advanced Programming Spring 2002 20

XML – Choice and Sequence Nested Choice and Sequence Groups <xsd: complex. Type name="Purchase.

XML – Choice and Sequence Nested Choice and Sequence Groups <xsd: complex. Type name="Purchase. Order. Type"> <xsd: choice> <xsd: group ref="ship. And. Bill" /> <xsd: element name="single. Address" type="Address" /> </xsd: choice> <xsd: element ref="comment" min. Occurs="0" /> <xsd: element name="items" type="Items" /> <xsd: attribute name="order. Date" type="xsd: date" /> </xsd: complex. Type> <xsd: group name="ship. And. Bill"> <xsd: sequence> <xsd: element name="ship. To" type="Address" /> <xsd: element name="bill. To" type="Address" /> </xsd: sequence> </xsd: group> Advanced Programming Spring 2002 21

XML – Choice and Sequence § A choice group element allows only one of

XML – Choice and Sequence § A choice group element allows only one of its children to appear in an instance. § Un-named groups of elements can be constrained so that only one of the elements may appear in an instance. § Alternatively, they can also be defined, and along with elements in named groups, they can be constrained to appear in the same order (sequence) as they are declared. Advanced Programming Spring 2002 22

XML – DOM § “The XML Document Object Model (DOM) is a programming interface

XML – DOM § “The XML Document Object Model (DOM) is a programming interface for XML documents. It defines the way an XML document can be accessed and manipulated. ” § DOM data structures used with XML are essentially trees. § http: //www. w 3 schools. com/dom_intro. asp Advanced Programming Spring 2002 23

Node properties Name Description attributes Returns a Named. Node. Map containing all attributes for

Node properties Name Description attributes Returns a Named. Node. Map containing all attributes for this node child. Nodes Returns a Node. List containing all the child nodes for this node first. Child Returns the first child node for this node last. Child Returns the last child node for this node next. Sibling Returns the next sibling node. Two nodes are siblings if they have the same parent node. Name Returns the node. Name, depending on the type node. Type Returns the node. Type as a number node. Value Returns, or sets, the value of this node, depending on the type owner. Document Returns the root node of the document parent. Node Returns the parent node for this node previous. Sibling Returns the previous sibling node. Two nodes are siblings if they have the same parent node Advanced Programming Spring 2002 24

Node methods Name Description append. Child(new. Child) Appends the node new. Child at the

Node methods Name Description append. Child(new. Child) Appends the node new. Child at the end of the child nodes for this node clone. Node(boolean) Returns an exact clone of this node. If the boolean value is set to true, the cloned node contains all the child nodes as well has. Child. Nodes() Returns true if this node has any child nodes insert. Before(new. Node, ref. Node) Inserts a new node, new. Node, before the existing node, ref. Node remove. Child(node. Nam e) Removes the specified node, node. Name replace. Child(new. Node, old. Node) Replaces the old. Node, with the new. Node Advanced Programming Spring 2002 25

XML – DOM § DOM validate XML § http: //www. w 3 schools. com/dom_validate.

XML – DOM § DOM validate XML § http: //www. w 3 schools. com/dom_validate. asp § Some DOM resources § http: //www. w 3 schools. com/dom_resources. asp Advanced Programming Spring 2002 26

Programming with XML § Don’t want to write a new parser for each application

Programming with XML § Don’t want to write a new parser for each application § Two APIs for C++ and Java: § SAX – events as parsed § DOM – object model (build tree & query) § Both implemented in Xerces (Apache) § Also, more specific implementations for XML RPC Advanced Programming Spring 2002 27

Sample Code <html> <body> <script type="text/vbscript"> txt="<h 1>Traversing the node tree</h 1>" document. write(txt)

Sample Code <html> <body> <script type="text/vbscript"> txt="<h 1>Traversing the node tree</h 1>" document. write(txt) set xml. Doc=Create. Object("Microsoft. XMLDOM") xml. Doc. async="false" xml. Doc. load("note. xml") for each x in xml. Doc. document. Element. child. Nodes document. write("<b>" & x. nodename & "</b>") document. write(": ") document. write(x. text) document. write(" ") next </script> </body> </html> Advanced Programming Spring 2002 28

Sample code public class Other. Parser implements Parser { private Document doc; public Other.

Sample code public class Other. Parser implements Parser { private Document doc; public Other. Parser(Document arg) { doc = arg; } public Source. Tuple parse. Doc() { String protocol=null; String url=null; int size=-1; String type=null; long created=-1; long last_mod=-1; String src=null; String opt[] = null; Element root; System. err. println(url); return new Source. Tuple(-1, protocol, url, size, type, created, last_mod, s 1, opt); } } root = doc. get. Root. Element(); try { created = root. get. Attribute("create. Date"). get. Long. Value(); } catch (Exception e) {} protocol = root. get. Child("Protocol"). get. Text(); url = root. get. Child("Name"). get. Text(); type = root. get. Child("Type"). get. Text(); try { size = Integer. parse. Int(root. get. Child("Size"). get. Text()); Advanced Programming last_mod = Long. parse. Long(root. get. Child("Last-Modified"). get. Text()); Spring 2002 } catch (Exception e) {} 29

SAX Java example - main import java. io. File. Reader; import org. xml. sax.

SAX Java example - main import java. io. File. Reader; import org. xml. sax. XMLReader; import org. xml. sax. Input. Source; import org. xml. sax. helpers. XMLReader. Factory; import org. xml. sax. helpers. Default. Handler; public class My. SAXApp extends Default. Handler { public static void main (String args[]) throws Exception { XMLReader xr = XMLReader. Factory. create. XMLReader(); My. SAXApp handler = new My. SAXApp(); xr. set. Content. Handler(handler); xr. set. Error. Handler(handler); // Parse each file provided on the command line. for (int i = 0; i < args. length; i++) { File. Reader r = new File. Reader(args[i]); xr. parse(new Input. Source(r)); } } public My. SAXApp () { super(); } } Advanced Programming Spring 2002 30

Java SAX example handlers public void start. Document () { System. out. println("Start document");

Java SAX example handlers public void start. Document () { System. out. println("Start document"); } public void end. Document () { System. out. println("End document"); } public void start. Element (String uri, String name, String q. Name, Attributes atts) { if ("". equals (uri)) System. out. println("Start element: " + q. Name); else System. out. println("Start element: {" + uri + "}" + name); } public void end. Element (String uri, String name, String q. Name) { if ("". equals (uri)) System. out. println("End element: " + q. Name); else System. out. println("End element: {" + uri + "}" + name); } Advanced Programming Spring 2002 31

Java SAX example characters public void characters (char ch[], int start, int length) {

Java SAX example characters public void characters (char ch[], int start, int length) { System. out. print("Characters: ""); for (int i = start; i < start + length; i++) { switch (ch[i]) { case '\': System. out. print("\\"); break; case '"': System. out. print("\""); break; case 'n': System. out. print("\n"); break; case 'r': System. out. print("\r"); break; case 't': System. out. print("\t"); break; default: System. out. print(ch[i]); break; } Advanced Programming } Spring 2002 System. out. print(""n"); 32

SAX for Java – output <? xml version="1. 0"? > <poem xmlns="http: //www. megginson.

SAX for Java – output <? xml version="1. 0"? > <poem xmlns="http: //www. megginson. com/ns/exp/poetry"> <title>Roses are Red</title> <l>Roses are red, </l> <l>Violets are blue; </l> </poem> java -Dorg. xml. sax. driver=com. example. xml. SAXDriver My. SAXApp roses. xml Start document Start element: {http: //www. megginson. com/ns/exp/poetry}poem Characters: "n" Start element: {http: //www. megginson. com/ns/exp/poetry}title Characters: "Roses are Red" End element: {http: //www. megginson. com/ns/exp/poetry}title Characters: "n" Start element: {http: //www. megginson. com/ns/exp/poetry}l Characters: "Roses are red, “. . . End element: {http: //www. megginson. com/ns/exp/poetry}poem End document Advanced Programming Spring 2002 33

Sax for C++ -- handler example void SAXPrint. Handlers: : start. Element(const XMLCh* const

Sax for C++ -- handler example void SAXPrint. Handlers: : start. Element(const XMLCh* const name, Attribute. List& attributes) { // The name has to be representable without any escapes f. Formatter << XMLFormatter: : No. Escapes << ch. Open. Angle << name; unsigned int len = attributes. get. Length(); for (unsigned int index = 0; index < len; index++) { f. Formatter << XMLFormatter: : No. Escapes << ch. Space << attributes. get. Name(index) << ch. Equal << ch. Double. Quote << XMLFormatter: : Attr. Escapes << attributes. get. Value(index) << XMLFormatter: : No. Escapes << ch. Double. Quote; } f. Formatter << ch. Close. Angle; } Advanced Programming Spring 2002 34

DOM counting example (C++) DOM_Document doc = parser->get. Document(); unsigned int element. Count =

DOM counting example (C++) DOM_Document doc = parser->get. Document(); unsigned int element. Count = doc. get. Elements. By. Tag. Name("*"). get. Length(); // Print out stats collected and time taken. cout << xml. File << ": " << duration << " ms (" << element. Count << " elems). " << endl; // delete the parser delete parser; // And call the termination method XMLPlatform. Utils: : Terminate(); Advanced Programming Spring 2002 35

SOAP § RPC mechanism: § XML + schema for request, response § HTTP and

SOAP § RPC mechanism: § XML + schema for request, response § HTTP and other transports Advanced Programming Spring 2002 36

SOAP example <? xml version='1. 0' ? > <env: Envelope xmlns: env="http: //www. w

SOAP example <? xml version='1. 0' ? > <env: Envelope xmlns: env="http: //www. w 3. org/2001/12/soap-envelope"> <env: Header> <m: reservation xmlns: m="http: //travelcompany. example. org/reservation" env: actor=http: //www. w 3. org/2001/12/soap-envelope/actor/next env: must. Understand="true"> <m: reference>uuid: 093 a 2 da 1 -q 345 -739 r-ba 5 d-pqff 98 fe 8 j 7 d</reference> <m: date. And. Time>2001 -11 -29 T 13: 35: 00. 000 -05: 00</m: date. And. Time> </m: reservation> <n: passenger xmlns: n=http: //mycompany. example. com/employees env: actor=http: //www. w 3. org/2001/12/soap-envelope/actor/next env: must. Understand="true"> <n: name>John Q. Public</n: name> </n: passenger> </env: Header> <env: Body> <p: itinerary xmlns: p="http: //travelcompany. example. org/reservation/travel"> <p: airport. Choices> JFK LGA EWR </p: airport. Choices> </p: itinerary> </env: Body> </env: Envelope> Advanced Programming Spring 2002 37

XML examples § XHTML for hypertext markup § Math. ML for mathematics in web

XML examples § XHTML for hypertext markup § Math. ML for mathematics in web pages § x 2 + 4 x + 4 =0 § <apply> <plus/> <apply> <power/> <ci>x</ci> <cn>2</cn> </apply> <times/> <cn>4</cn> <ci>x</ci> </apply> <cn>4</cn> </apply> § SVG for line graphics § Voice. XML for voice browsers § RDF for describing resources Advanced Programming Spring 2002 38

XML: pros & cons J Rich set of related languages: § XSL for transformation

XML: pros & cons J Rich set of related languages: § XSL for transformation § XML Query for queries on XML documents § XSD for structure definition J Lots of tools: § parser for C/C++, Java: Xerces § Tcl, Python, etc. J J L L L Can be generated easily with text editors and printf Buzzword compliant Not space efficient Not well-suited for binary data Untyped data except with XSD Advanced Programming Spring 2002 39