XML What is XML XML v s HTML

  • Slides: 35
Download presentation
XML What is XML? XML v. s. HTML XML Components Well-formed and Valid Document

XML What is XML? XML v. s. HTML XML Components Well-formed and Valid Document Type Definition (DTD) Extensible Style Language (XSL) SAX and DOM

What is XML ? § Extensible Markup Language(XML) is a meta-language that describes the

What is XML ? § Extensible Markup Language(XML) is a meta-language that describes the content of the document(self-describing data) § Derives from SGML. Interoperable with both HTML and SGML.

XML v. s. HTML § Markup languages generally combine two distinct functions of representing

XML v. s. HTML § Markup languages generally combine two distinct functions of representing text (document) –the ‘look’ and the ‘structure’. § HTML and XML have different sets of goals. While HTML was designed to display data and hence focused on the ‘look’ of the data, XML was designed to describe and carry data and hence focuses on ‘what data is’.

XML v. s. HTML § HTML is about displaying data and XML is about

XML v. s. HTML § HTML is about displaying data and XML is about describing data. § HTML and XML are complementary to each other. § HTML explicitly defines a set of legal tags. <TABLE>…. </TABLE> XML allows any tags to be used , you can create new tags. <BOOK>…. </BOOK>

XML Components Prolog Defines the xml version, entity definitions, and DOCTYPE Components of the

XML Components Prolog Defines the xml version, entity definitions, and DOCTYPE Components of the document Tags and attributes CDATA(character data) Entities Processing instructions Comments

XML Prolog XML Files always start with a prolog <? xml version=“ 1. 0”

XML Prolog XML Files always start with a prolog <? xml version=“ 1. 0” encoding=“ISO-8859 -1” standalone=“no”? > The version of xml is required The encoding identified character set(default UTF-8) The value standalone identifies if an external document is referenced for DTD of entity definition The prolog can contain entities and DTD definitions

Prolog Example <? xml version=“ 1. 0” standalone=“yes”? > <DOCTYPE authors[ <!ELEMENT authors (name)*>

Prolog Example <? xml version=“ 1. 0” standalone=“yes”? > <DOCTYPE authors[ <!ELEMENT authors (name)*> <!ELEMENT name (firstname, lastname)> <!ELEMENT firstname (#PCDATA)> <!ELEMENT lastname (#PCDATA)> ]> <authors> <name> <firstname>James</firstname> <lastname>Gosling</lastname> </name> … </authors>

XML DOCTYPE Document Type Declarations § Specifies the location of the DTD defining the

XML DOCTYPE Document Type Declarations § Specifies the location of the DTD defining the syntax and structure of elements in the document § Common forms: <!DOCTYPE root [DTD]> <!DOCTYPE root SYSTEM URL> <!DOCTYPE root PUBLIC FPI-identifier URL> § The root identifies the starting element( root element) of the document § The DTD can be external to the XML document, referenced by a SYSTEM or PUBLIC URL • PUBLIC URL refers to a DTD intended for public use • SYSTEM UPL refers to a private DTD (located on the local file system or HTTP server)

DOCTYPE Examples <!DOCTYPE book “book. dtd”> Book must be the root element DTD located

DOCTYPE Examples <!DOCTYPE book “book. dtd”> Book must be the root element DTD located in same directory of xml document <!DOCTYPE book SYSTEM “http: //. vishnu. cs. lamar. edu/~jingw/book. dtd DTD located HTTP server: vishnu. cs. lamar. edu

XML DOCTYPE Specifying a PUBLIC DTD <!DOCTYPE root PUBLIC FPI-identifier URL> The Formal Public

XML DOCTYPE Specifying a PUBLIC DTD <!DOCTYPE root PUBLIC FPI-identifier URL> The Formal Public Identifier(FPI) has four parts: 1. Connection of DTD to a formal standard if defining yourself + nonstandards body has approved the DTD ISO if approved by formal standards committee 2. Group responsible for the DTD 3. Description and type of document 4. Language used in the DTD

PUBLIC DOCTYPE Example <!DICTYPE Book PUBLIC “-//w 3 c//DTD XHMTL 1. 0 Transitional //EN”

PUBLIC DOCTYPE Example <!DICTYPE Book PUBLIC “-//w 3 c//DTD XHMTL 1. 0 Transitional //EN” “http: //www. w 3. org/TR? xhtml 1/DTD/xhtml 1 transitional. dtd”> <!DICTYPE CWP PUBLIC “-//Prenticd Hall//DTD Core Series 1. 0 //EN” “http: //www. prenticehall. com/DTD/Core. dtd”>

XML Root Element Required for XML –aware applications to recognize beginning and end of

XML Root Element Required for XML –aware applications to recognize beginning and end of document, it is the first element. All other elements must be nested within this root element. Example: <? xml version=” 1. 0” ? > <book> <title>123</tilte> … </book>

XML Tags Tag names: Case sensitive Start with a letter or underscore After first

XML Tags Tag names: Case sensitive Start with a letter or underscore After first charcater, numbers, - and. are allowed Connot contain whitespaces Avoid use of colon expect for indicating namespaces Tags can have attributes <message to=“Gates@microsoft. com” from=“Gosling@sun. com”> <priority/> <text> what did you do ? </text> </message> All XML elements must have close tags.

Document CDATA(character data) is not parsed <? xml version=“ 1. 0” encoding=“UTF-8”? > <server>

Document CDATA(character data) is not parsed <? xml version=“ 1. 0” encoding=“UTF-8”? > <server> <port status=“accept”> <![CDATA[8001 <= port < 9000 ] ]> </port> </server>

Document Entities refer to a data item, typically text General entity references start with

Document Entities refer to a data item, typically text General entity references start with & and end with ; The entity reference is replaced by it’s true value when parsed The characters < > & ‘ “ require entity references to avoid conflicts with the XML application < > & " &apos; Entities are user definable <? xml version=“ 1. 0” standalone=“yes” ? > <!DOCTYPE book[ <!ELEMENT book (title)> <!ELEMENT title (#PCDATA) > <!ENTITY copyright “ 2001, Prentice Hall “> ]> <book> <title>web programming, &copyright; </title> </book>

Processing Instructions Application-specific instruction to the XML processor <? processor-instruction? > Example <? xml

Processing Instructions Application-specific instruction to the XML processor <? processor-instruction? > Example <? xml version=“ 1. 0” ? > <? xml-stylesheet type=“text/xml” href=“orders. xsl” ? > <orders> <order> <count>37</count> <price>49. 99</price> <book> <isbn>0130896789</isbn> <author>Marty Hall </author> </book> </orders>

XML Comments are the same as HTML comments <!-- This is an xml and

XML Comments are the same as HTML comments <!-- This is an xml and html comment -->

Well-formed versus Valid An XML document can be well-formed if it follows basic syntax

Well-formed versus Valid An XML document can be well-formed if it follows basic syntax rules. An XML document is valid if its structure matches a Document Type Definition (DTD) and it is well-formed.

Document Type Definition(DTD) Defines Structure of the Document • Allowable tags and their attributes

Document Type Definition(DTD) Defines Structure of the Document • Allowable tags and their attributes • Attribute values constraints • Nesting of tags • Number of occurrences for tags • Entity definitions

DTD Example <? xml version=“ 1. 0” encoding=”UTF-8” ? > <!ELEMENT <!ELEMENT <!ELEMENT <!ATTLIST

DTD Example <? xml version=“ 1. 0” encoding=”UTF-8” ? > <!ELEMENT <!ELEMENT <!ELEMENT <!ATTLIST <!ATTLIST TVSCHEDULE (CHANNEL+)> CHANNEL (BANNER, DAY+)> BANNER (#PCDATA)> DAY ((DATE, HOLIDAY) | (DATE, PROGRAMSLOT+))+> HOLIDAY (#PCDATA)> DATE (#PCDATA)> PROGRAMSLOT (TIME, TITLE, DESCRIPTION? )> TIME (#PCDATA)> TITLE (#PCDATA)> DESCRIPTION (#PCDATA)> TVSCHEDULE NAME CDATA #REQUIRED> CHANNEL CHAN CDATA #REQUIRED> PROGRAMSLOT VTR CDATA #IMPLIED> TITLE RATING CDATA #IMPLIED> TITLE LANGUAGE CDATA #IMPLIED>

Defining Elements <!ELEMENT name definition/type> <!ELEMENT CHANNEL (BANNER, DAY+)> <!ELEMENT BANNER (#PCDATA)> <!ELEMENT DAY

Defining Elements <!ELEMENT name definition/type> <!ELEMENT CHANNEL (BANNER, DAY+)> <!ELEMENT BANNER (#PCDATA)> <!ELEMENT DAY ((DATE, HOLIDAY) | (DATE, PROGRAMSLOT+))+> Types ANY Any well-formed xml data EMPTY Element cannot contain any text or child elements PCDATA Character data only (should not contain markup) Elements List of legal child elements (no character data) Mixed May contain character data and/or child elements (cannot constrain order and number of child elements)

Defining Elements Cardinality [none] ? * + Default(one and only one instance) 0, 1,

Defining Elements Cardinality [none] ? * + Default(one and only one instance) 0, 1, …, n 1, 2, …, n List Operators , | Sequence( in order) <! ELEMENT book (title, price, author)> Choice(one of several) <! ELEMENT classroom (teacher | student)>

Defining Attribute <!ATTLIST element attr. Name type modifier> Example <!ELEMENT Customer (#PCDATA)> <!ATTLIST Customer

Defining Attribute <!ATTLIST element attr. Name type modifier> Example <!ELEMENT Customer (#PCDATA)> <!ATTLIST Customer id CDATA #IMPLIED> <!ELEMENT Product (#PCDATA)> <!ATTLIST Product cost CDATA #FIXED “ 200” id CDATA #REQUIRED>

Attribute Type § CDATA Essentially anything; simply unparsed data <!ATTLIST Customer id CDATA #IMPLIED>

Attribute Type § CDATA Essentially anything; simply unparsed data <!ATTLIST Customer id CDATA #IMPLIED> § Enumeration Attribute(value 1|value 2|value 3)[Modifier] § Eight other attribute types ID, IDREF, NMTOKENS, ENTIRY, ENTITIES, NOTATION

Attribute Modifiers #IMPLIED Attribute is not required <!ATTLIST Customer id CDATA #IMPLIED> #REQUIRED Attribute

Attribute Modifiers #IMPLIED Attribute is not required <!ATTLIST Customer id CDATA #IMPLIED> #REQUIRED Attribute must be present <!ATTLIST Customer id CDATA #REQUIRED> #FIXED “value” Attribute is present and always has this value <!ATTLIST Product cost CDATA #FIXED “ 200”> Default value (applies to enumeration) <!ATTLIST car color (red|white|blue) “white”>

Defining Entities Specify entity reference resolution in a DTD using the ENTITY keyword. <!ENTITY

Defining Entities Specify entity reference resolution in a DTD using the ENTITY keyword. <!ENTITY name “replacement” > <!ENTITY copyright “Copyright 2001” >

Limitations of DTDs § DTD itself is not in XML format – more work

Limitations of DTDs § DTD itself is not in XML format – more work for parsers § Does not express data types (weak data typing) § No namespace support § Document can override external DTD definitions § No DOM support § XML Schema is intended to resolve these issues but … DTDs are going to be around for a while

Namespace § Namespaces identify collections of element type declarations so that they do not

Namespace § Namespaces identify collections of element type declarations so that they do not conflict with other element type declarations with the same name created by other programmers § Two predefined XML namespaces are xml and xsl. § You can create your own namespaces Example: <subject> English</subject> <subject>Thrombosis</subject> can be differentiated by using namespaces, as in <school: subject>English</school: subject> <medical: subject>Thrombosis</medical: subject>

XSL - Extensible Style Language • Defines the layout of an xml document, an

XSL - Extensible Style Language • Defines the layout of an xml document, an XSL style sheet provides the rules for displaying an XML document. • XSLT is XSL transformations. • XML -> XSLT -> HTML • In XML document include: <? xml-stylesheet type="text/xsl" href=“my. XSL. xsl"? >

XSL Example • <? xml version="1. 0" encoding="big 5"? > <xsl: stylesheet version="1. 0"

XSL Example • <? xml version="1. 0" encoding="big 5"? > <xsl: stylesheet version="1. 0" xmlns: xsl=“http: //www. w 3. org/TR/WD-xsl”> <xsl: template match="/">. . . . HTML. . </xsl: template> </xsl: stylesheet>

What is the SAX? SAX is the Simple API for XML, originally a Javaonly

What is the SAX? SAX is the Simple API for XML, originally a Javaonly API. SAX was the first widely adopted API for XML in Java, and is a “de facto” standard. SAX is an event-based API. The application implements handlers to deal with the different events, much like handling events in a graphical user interface.

What is the Document Object Model (DOM)? Is a platform- and language-neutral interface that

What is the Document Object Model (DOM)? Is a platform- and language-neutral interface that will allow programs and scripts to dynamically access and update the content, structure and style of documents. Provides APIs that let you create nodes, modify them, delete and rearrange them. So it is relatively easy to create a DOM. Maintains a recommended tree-based API for XML and HTML documents.

DOM/SAX Processing DOM is a standard. It yields a tree in memory. SAX yields

DOM/SAX Processing DOM is a standard. It yields a tree in memory. SAX yields a sequence of events corresponding to XML input. Both generally destroy attribute ordering, insignificant white space, insignificant namespace aspects, … Verification of a signature based on DOM/SAX requires serialization to a byte stream of the DOM tree or the SAX event stream.

Summary § XML is a self-describing meta data § DOCTYPE defines the root element

Summary § XML is a self-describing meta data § DOCTYPE defines the root element and location of DTD § Document Type Definition(DTD) defines the grammar of the document Required to validate the document Constrains grouping and cardinality of elements § XSL is defined as a language for expressing stylesheets Is a language for transforming XML documents Is an XML vocabulary for specifying the formatting of XML documents § DOM and SAX are two most common low-level APIs, they are all in some form of standardization (SAX as a de facto, DOM by the W 3 C )

XML Resources • XML 1. 0 Specification http: //www. w 3. org/TR/REC-xml • WWW

XML Resources • XML 1. 0 Specification http: //www. w 3. org/TR/REC-xml • WWW consortium’s Home Page on XML http: //www. w 3. org/XML/ • Sun Page on XML and Java http: //java. sun. com/xml/ • Apache XML Project http: //xml. coverpages. org/ • XML Resource Collection http: //xml. coverpages. org/ • O’Reilly XML Resource Center http: //www. xml. com/