www hndit com XML Related Technologies www hndit
































































- Slides: 64
 
	www. hndit. com XML & Related Technologies
 
	www. hndit. com Learning Outcomes • Use current technology for developing distributed systems and applications IT 4101 - Multitiered Application Development - XML 2
 
	www. hndit. com XML Fundamentals • XML stands for EXtensible Markup Language. • XML: A W 3 C standard to complement HTML • XML was designed to carry data, not to display data. • Two facets of XML: document-centric and datacentric • XML tags are not predefined. You must define your own tags • XML is designed to be self-descriptive 3
 
	www. hndit. com XML Fundamentals, cont’ • Ideal as “Data Interchange” format. • Key technology for “distributed” applications. • All major database products have been retrofitted with facilities to store and construct XML documents. • XML is closely related to object-oriented and so-called semi-structured data. IT 4101 - Multitiered Application 4 Development - XML
 
	www. hndit. com XML Document Example <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> IT 4101 - Multitiered Application Development - XML 5
 
	www. hndit. com XML Data Model: Example – XML Tree note To From Tove Heading Jani IT 4101 - Multitiered Application Development - XML Reminder Body Don't forget me this weekend! 6
 
	www. hndit. com The Difference Between XML and HTML • Used to describe content rather than presentation • New tags may be defined at will by the author of the document (extensible) • No semantics behind tags. For instance, HTML’s <table>…</table> means: render contents as a table; in XML: doesn’t mean anything special. IT 4101 - Multitiered Application Development - XML 7
 
	www. hndit. com • Structures may be nested arbitrarily • XML document may contain an optional schema that describes its structure • Intolerant to bugs; Browsers will render buggy HTML pages but XML processors will reject illformed XML documents. IT 4101 - Multitiered Application Development - XML 8
 
	www. hndit. com The structure of an XML Document • An XML document starts out with an XML declaration and contains elements and text. • An Element can contain text, child elements, or both(mixed content). • Elements can have attributes. Use attributes to describe how to interpret the element content. IT 4101 - Multitiered Application Development - XML 9
 
	www. hndit. com The structure of an XML Document <? xml version="1. 0" encoding="ISO-8859 -1"? > <root attribute = “…. . ”> <child> <subchild>. . . </subchild> </child> <child attribute=“…. ”>…. . </child> </root> IT 4101 - Multitiered Application Development - XML 10
 
	www. hndit. com The structure of an XML Document, example <? xml version="1. 0" encoding="UTF-8"? > <Email> XML declaration Root Element <Header> <From>nethu@gmail. com</From> <To>che@yahoo. com</To> Child Elements Sub child Elements <CC>deep@gmail. com</CC> <Subject>Hello from nanga</Subject> </Header> <Body> Hi… How are u my dear Ayya and Amma</Body> </Email> IT 4101 - Multitiered Application Development - XML 11
 
	XML Components www. hndit. com • The main components of an XML document are : – Elements – Content – Attributes – Comments IT 4101 - Multitiered Application Development - XML 12
 
	www. hndit. com XML Syntax IT 4101 - Multitiered Application Development - XML 13
 
	www. hndit. com XML Components - Elements • An element is the basic building block of an XML document. • Used to describe the content in XML document. • Elements are the main markup components. • Each element represents a piece of data, identified by a tag. IT 4101 - Multitiered Application Development - XML 14
 
	www. hndit. com XML Syntax XML Elements • An element, composed of a start tag and an end tag: <start_tag> element content </end_tag> • Element content can include other elements, or strings • An empty element looks like this: <empty_element/> • All XML documents (“XML instances”) contain only one top-level element (Root element) IT 4101 - Multitiered Application Development - XML 15
 
	www. hndit. com XML Syntax • Elements can carry attributes in their start tags: – <start_tag attribute_name=“att_value”> – An element can have as many attributes as are declared for it, and they can be required or optional • Note: Element and attribute names are casesensitive IT 4101 - Multitiered Application Development - XML 16
 
	XML Syntax www. hndit. com <person> <name>Alan</name> <age>42</age> <email>agb@abc. com</email> </person> IT 4101 - Multitiered Application Development - XML 17
 
	www. hndit. com Note: XML Syntax • Element includes the start and end tag • No quotation marks around strings; XML treats all data as text. This is referred to as PCDATA (Parsed Character Data). • Empty elements: <married></married> can be abbreviated to <married/> IT 4101 - Multitiered Application Development - XML 18
 
	Elements , cont’ Collections are expressed using repeated structures. www. hndit. com XML Syntax Ex. The collection of all persons on the 4 th floor: <table> <description>People on the 4 th floor</description> <people> <person> <name>Alan</name> <age>42</age> <email>agb@abc. com</email> </person> <name>Patsy</name> <age>36</age> <email>ptn@abc. com</email> </person> <name>Ryan</name><age>58</age><email>rgz@abc. com</email> </person> </people> </table> IT 4101 - Multitiered Application Development - XML 19
 
	www. hndit. com Attributes XML Syntax • Attributes define some properties of elements • Expressed as a name-value pairs • As with tags, user may define any number of attributes • Attribute values must be enclosed within quotation marks. – Eg : <note date=“ 12/11/2007” > IT 4101 - Multitiered Application Development - XML 20
 
	www. hndit. com XML Syntax Attributes , cont’ <product> <name language="French">trompette</name> <price currency="Euro">420. 12</price> <address format="XLB 56" language="French"> <street>31 rue Croix-Bosset</street> <zip>92310</zip> <city>Sevres</city> <country>France</country> </address> </product> IT 4101 - Multitiered Application Development - XML 21
 
	XML Syntax Attributes vs Elements www. hndit. com • A given attribute can occur only once within a tag; Its value is always a string • On the other hand tags defining elements/sub-elements can repeat any number of times and their values may be string data or sub-elements • Same data may be encoded using attributes or elements or a combination of the two <person name="Alan" age="42"> <email>agb@abc. com</email> </person> <person name="Alan"> <age>42</age> or <email>agb@abc. com</email > IT 4101 - Multitiered Application 22 Development - XML </person>
 
	www. hndit. com Mixing Elements and Text • XML allows us to mix PCDATA and sub-elements within an XML Syntax element. <person> This is my best friend <name>Alan</name> <age>42</age> I am not sure of the following email <email>agb@abc. com</email> </person> • This seems un-natural from a database perspective, but from a document perspective, this is quite natural! IT 4101 - Multitiered Application Development - XML 23
 
	Entity References www. hndit. com • Some characters have a special meaning in XML Syntax • If you place a character like "<" inside an XML element, it will generate an error because the parser interprets it as the start of a new element. • This will generate an XML error: <message>if salary < 1000 then</message> • To avoid this error, replace the "<" character with an entity reference: IT 4101 - Multitiered Application <message>if salary < 1000 then</message> if salary < 1000 then Development - XML 24
 
	XML Syntax www. hndit. com Five predefined entity references in XML: < > & ' " < > & ' " less than greater than ampersand apostrophe quotation mark IT 4101 - Multitiered Application Development - XML 25
 
	Other XML Constructswww. hndit. com Ø Comments: XML Syntax Ø Used to add notes to an XML document. Ø The browser and the XML processors ignore comments <!-- this is a comment --> IT 4101 - Multitiered Application Development - XML 26
 
	Other XML Constructs www. hndit. com Ø Processing Instruction (PI): <? xml version="1. 0"? > <? xml-stylesheet type="text/xsl" href="classes. xsl"? > Such instructions are passed on to applications that process XML files. Ø CDATA (Character Data): used to write escape blocks containing text that otherwise would be considered markup: <![CDATA[<start>this is not an element</start>]]> IT 4101 - Multitiered Application Development - XML 27
 
	www. hndit. com Well Formed XML Documents XML Syntax • A "Well Formed" XML document has correct XML syntax. – XML documents must have a root element – XML elements must have a closing tag – XML tags are case sensitive – XML elements must be properly nested – XML attribute values must be quoted IT 4101 - Multitiered Application Development - XML 28
 
	www. hndit. com Well Formed XML Documents, cont’ XML Syntax • An XML document must be well-formed before it can be processed. • A well-formed XML document will parse into a node-labeled tree IT 4101 - Multitiered Application Development - XML 29
 
	Terminology www. hndit. com attributes <? xml version=“ 1. 0” ? > Root elements <Person. List Type=“Student” Date=“ 2002 -02 -02” > <Title Value=“Student List” /> <Person> … … … </Person> <Person> Empty … … … element </Person> </Person. List> Element (or tag) names • Elements are nested • Root element contains all others IT 4101 - Multitiered Application Development - XML 30
 
	www. hndit. com The structure of the XML document • Three segments: – A declaration that announces that the file is an XML file. – An optional definition about the type of XML data and what DTD it follows. – The content marked up using XML tags and comments. IT 4101 - Multitiered Application Development - XML 31
 
	www. hndit. com Structure of the XML document Declaration • An XML file begins with an XML declaration. • States that the file is an XML file. • The XML declaration is written as: <? xml version encoding standalone ? > Eg: <? xml version=“ 1. 0” encoding=“UTF-8” standalone =“yes”? > IT 4101 - Multitiered Application Development - XML 32
 
	www. hndit. com DTD –Document Type Definition • Defines the structure of the content of an XML document, and hence allows storing data in a consistent format. • The DTD is a set of rules that defines an element , element attribute, attribute values, and the relationship between elements in the document IT 4101 - Multitiered Application Development - XML 33
 
	www. hndit. com DTD, cont • When an XML document is processed , it is compared to its associated DTD to ensure it is structured correctly and all tags are used correctly. • Two types: – Internal DTD – External DTD IT 4101 - Multitiered Application Development - XML 34
 
	www. hndit. com Internal DTD • DTD which includes as a part of a XML document. • The syntax for creating an internal DTD: Document type declaration <!DOCTYPE rootelement Name of the root element [element and attribute declarations ]> IT 4101 - Multitiered Application Development - XML 35
 
	External DTD www. hndit. com • Is stored as a separate file having the declaration of all elements and attributes that can be used in an XML document. • Syntax <!DOCTYPE rootelement [PUBLIC | SYSTEM] “nameoffile”> Document type declaration Name of the root element Name of the DTD PUBLIC – specifies that a DTD is stored on a public server SYSTEM - specifies that a DTD is stored on a local machine. IT 4101 - Multitiered Application Development - XML 36
 
	DTD Symbols Symbol www. hndit. com Meaning Example Description , | ? “and ” in specific order Fname , Lname Fname and Lname in that order “or” Fname | Lname Fname or Lname () Used for grouping elements (Fname | Lname) , An Fname element or Address Lname element must be present and must precede the Address element. * There can be zero or multiple occurrences of the element (Fname | Lname)* + At least one (Fname +) occurrence of the element. There can be IT 4101 - Multitiered Application multiple occurences. “optional” , can occur Lname ? only once Development - XML Lname need not to be present, but if it is present , it can occur only once Any number of Fname or Lname elements can be present in any order There can be multiple Fname elements 37
 
	www. hndit. com Declaring elements in a DTD • XML allows you to create your own set of tags • Syntax : <! ELEMENT elementname contenet > • Element names consists of letters, digits, hyphens, underscores, . Spaces and tabs are not allowed. IT 4101 - Multitiered Application Development - XML 38
 
	www. hndit. com Declaring elements in a DTD Element type Description Syntax • Types of elements Empty elements have no content and are marked up as <emptyelement/> <!ELEMENT elt EMPTY> Unrestricted Can contain any element declared elsewhere in DTD <!ELEMENT elt ANY> Container Elements can contain character data <!ELEMENT elt (#PCDATA) IT 4101 - Multitiered Application Development - XML 39
 
	Declaring elements in a DTD - Example www. hndit. com • If both Fname and Lname have to be specified and Fname should be followed by Lname, the DTD would look as follows. <! ELEMENT student(Fname, Lname)> <! ELEMENT Fname(#PCDATA)> <! ELEMENT Lname(#CDATA)> Note: #PCDATA – parsed character data #CDATA – character data , not parsed by a paser IT 4101 - Multitiered Application Development - XML 40
 
	www. hndit. com Attributes • Syntax: < ! ATTLIST element attribute type default > Common Attribute types Type description Attribute type CDATA Any character data (V 1, V 2, …. ) One of V 1, V 2, … IT 4101 - Multitiered Application Development - XML 41
 
	www. hndit. com Categories of attributes IT 4101 - Multitiered Application Development - XML 42
 
	www. hndit. com Example DTD (internal DTD) <!DOCTYPE transfers [ <!ELEMENT transfers (funds. Transfer)+ > <!ELEMENT funds. Transfer (from, to) > <!ATTLIST funds. Transfer date CDATA #REQUIRED> <!ELEMENT from (amount, transit. ID? , account. ID, acknowledge. Receipt ) > <!ATTLIST from type (intrabank|internal|other) #REQUIRED> <!ELEMENT amount (#PCDATA) >. . . Omitted DTD content. . . <!ELEMENT to EMPTY > <!ATTLIST to account CDATA #REQUIRED> ]> <transfers> <funds. Transfer date="20010923 T 12: 34 Z">. . . As with previous example. . . 43
 
	www. hndit. com Parsing XML Documents • To read analyse the content of an XML document , you need an XML parser. • A parser is a program that reads a document , checks whether it is syntactically correct, and takes some actions as it processes the document. IT 4101 - Multitiered Application Development - XML 44
 
	www. hndit. com XML Parser Processing Model parser interface XML data parser XML-based application DTD · The parser must verify that the XML data is syntactically correct. · Such data is said to be well-formed – The minimal requirement to “be” XML · A parser MUST stop processing if the data isn’t well-formed – E. g. , stop processing and “throw an exception” to the XML-based application. 45
 
	www. hndit. com XML Parsers • • • Validating/Non-Validating Tree-based Event-based SAX-compliance Not technically parsers – XSL – XPath
 
	Some Java XML Parserswww. hndit. com • DOM – Sun JAXP – IBM XML 4 J – Apache Xerces – Resin (Caucho) – DXP (Data. Channel) • SAX – Sun JAXP – SAXON • JDOM
 
	XML Parsers, DTDs, and www. hndit. com Internal Entities · The parser processes the DTD content, identifies the internal entities, and checks that each entity is well-formed. · There are explicit syntax rules for DTD content -- well-formed XML must be correct here also. · The parser then replaces every occurrence of an entity reference by the referenced entity (and does so recursively within entities) · The “resolved” data object is then made available to the XML application IT 4101 - Multitiered Application Development - XML 48
 
	XML Parsers and External Entities www. hndit. com · The parser processes the DTD content, identifies the external entities, and “tries” to resolve them · The parser then replaces every occurrence of an entity reference by the referenced entity, and does so recursively within all those entities, (like with internal entities) · That depends on the application / parser type – There are two types of XML parsers – one that MUST retrieve all entities, and one that can ignore them (if it can’t find them) 49
 
	Two types of XML parsers · Validating parser www. hndit. com – Must retrieve all entities and must process all DTD content. Will stop processing and indicate a failure if it cannot – There is also the implication that it will test for compatibility with other things in the DTD -- instructions that define syntactic rules for the document (allowed elements, attributes, etc. ). We’ll talk about these parts in the next section. · Non-validating parser – Will try to retrieve all entities defined in the DTD, but will cease processing the DTD content at the first entity it can’t find, But this is not an error -- the parser simply makes available the XML data (and the names of any unresolved entities) to the application. Application behavior will depend on parser type 50
 
	XML Parser Processing Model www. hndit. com parser interface XML data parser XML-based application Relationship/ behavior depends on parser nature DTD Many parsers can operate in either validating or non-validating mode (parameter-dependent) 51
 
	www. hndit. com How do you define language dialects? • Two ways of doing so: – XML Document Type Declaration (DTD) -- Part of core XML spec. – XML Schema -- New XML specification, which allows for stronger constraints on XML documents. • Adding dialect specifications implies two classes of XML data: – Well-formed – Valid An XML document that is syntactically correct An XML document that is both well-formed and consistent with a specific DTD (or Schema) • What DTDs and/or schema specify: – Allowed element and attribute names, hierarchical nesting rules; element content/type restrictions • Schemas are more powerful than DTDs. They are often used for type validation, or for relating database schemas to XML models 52
 
	XML Namespaces www. hndit. com • In XML, element names are defined by the developer. This often results in a conflict • when trying to mix XML documents from different XML applications. • Mechanism for identifying different “spaces” for XML names – That is, element or attribute names • This is a way of identifying different language dialects, consisting of names that have specific semantic (and processing) meanings. 53
 
	www. hndit. com XML Namespaces • Thus <key/> in one language (might mean a security key) can be distinguised from <key/> in another language (a database key) • Mechanism uses a special xmlns attribute to define the namespace. The namespace is given as a URL string – But the URL does not reference anything in particular (there may be nothing there) IT 4101 - Multitiered Application Development - XML 54
 
	www. hndit. com Mixing language dialects together Namespaces let you do this relatively easily: <? xml version= "1. 0" encoding= "utf-8" ? > Default ‘space’ is xhtml <html xmlns="http: //www. w 3. org/1999/xhtml 1" xmlns: mt="http: //www. w 3. org/1998/mathml” > <head> <title> Title of XHTML Document </title> </head><body> <div class="my. Div"> <h 1> Heading of Page </h 1> <mt: mathml> <mt: title>. . . Math. ML markup. . . </mt: mathml> <p> more html stuff goes here </p> mt: prefix indicates </div> ‘space’ mathml (a </body> different language) </html> 55
 
	XML Software www. hndit. com • XML parser -- Reads in XML data, checks for syntactic (and possibly DTD/Schema) constraints, and makes data available to an application. There are three 'generic' parser APIs – SAX – DOM – JDOM Simple API to XML (event-based) Document Object Model (object/tree based) Java Document Object Model (object/tree based) • Lots of XML parsers and interface software available (Unix, Windows, OS/390 or Z/OS, etc. ) • SAX-based parsers are fast (often as fast as you can stream data) • DOM slower, more memory intensive (create inmemory version of entire document) 56
 
	XML Processing: XSLT www. hndit. com XSLT e. Xtensible Stylesheet Language -Transformations – An XML language for processing XML – Does tree transformations -- takes XML and an XSLT style sheet as input, and produces a new XML document with a different structure • Advantages – Very useful for tree transformations -- much easier than DOM or SAX for this purpose – Can be used to query a document (XSLT pulls out the part you want) • Disadvantages – Can be slow for large documents or stylesheets – Can be difficult to debug stylesheets (poor error detection; much better if you use schemas) 57
 
	www. hndit. com XSLT processing model schema XSLT Processing model XSLT style sheet in XML data in XSLT processor XML parser data out (XML) XML parser document “objects” for data and style sheet schema order partorders desc text part quantity delivery-date order xza partorders foo bee order 58
 
	XML Messaging www. hndit. com • Use XML as the format for sending messages between systems • Advantages are: – Common syntax; self-describing (easier to parse) – Can use common/existing transport mechanisms to “move” the XML data (HTTP, HTTPS, SMTP (email), MQ, IIOP/(CORBA), JMS, …. ) • Requirements – Shared understanding of dialects for transport (required registry [namespace!] ) for identifying dialects – Shared acceptance of messaging contract • Disadvantages – Asynchronous transport; no guarantee of delivery, no guarantee that partner (external) shares acceptance of contract. – Messages will be much larger than binary (10 x or more) [can compress] 59
 
	www. hndit. com Common messaging model • XML over HTTP – Use HTTP to transport XML messages – POST /path/to/interface. pl HTTP/1. 1 Referer: http: //www. foo. org/my. Client. html User-agent: db-server-olk Accept-encoding: gzip Accept-charset: iso-8859 -1, utf-8, ucs Content-type: application/xml; charset=utf-8 Content-length: 13221. . . <? xml version=“ 1. 0” encoding=“utf-8” ? > <message>. . . Markup in message. . . </message> 60
 
	Some standards for message www. hndit. com format • Define dialects designed to “wrap” remote invocation messages • XML-RPC – Very simple way of encoding function/method call name, and passed parameters, in an XML message. • SOAP (Simple object access protocol) – More complex wrapper, which lets you specify schemas for interfaces; more complex rules for handling/proxying messages, etc. This is a core component of Microsoft’s . NET strategy, and is integrated into more recent versions 61 of Websphere and other commercial packages.
 
	www. hndit. com XML Messaging + Processing • XML as a universal format for data exchange Application SOAP API Factory Place order SOAP interface (XML/edi) using SOAP over HTTP Supplier SOAP Supplier XML/ EDI Transport HTTP(S) SMTP other. . . Supplier Response (XML/edi) using SOAP over HTTP 62
 
	W 3 C rec XML (and related) Specifications XML Core XML 1. 0 industry std W 3 C draft ‘Open’ std www. hndit. com Xfragment XML names RDF Canonical Xpath Math. ML APIs XSLT JDOM Xpointer SMIL 1 & 2 XML base SVG JAXP DOM 1 DOM 2 DOM 3 Xlink XSL XML signature XHTML events UDDI XML-RPC Biztalk eb. XML WDDX . . . XMI Protocols WSDL. . . Web Services XHTML 1. 0 Xforms XML schema SOAP Style …. . . XML query …. SAX 1 SAX 2 CSS 1 CSS 2 CSS 3 Infoset Modularized XHTML Fin. XML IFX Fp. ML XHTML basic dir. XML. . . Application areas 100's more. . 63
 
	www. hndit. com References • w 3 school IT 4101 - Multitiered Application Development - XML 64
