XML Extensible Markup Language Chapter 17 1 Randy
XML Extensible Markup Language Chapter 17 - 1 Randy Connolly and Ricardo Hoar Fundamentals of Web Development Textbook to be published by Pearson © Ed 2015 in early Pearson 2014 Fundamentals ofhttp: //www. funwebdev. com Web Development
XML Overview Introduction • XML is a text-based markup language, but unlike HTML, XML can be used to mark up any type of data. • Derived from Standard Generalized Markup Language SGML • One of the key benefits of XML data is that as plain text, it can be read and transferred between applications and different operating systems as well as being human-readable and understandable as well. • XML is not only used on the web server and to communicate asynchronously with the browser, but is also used as a data interchange format for moving information between systems Randy Connolly and Ricardo Hoar Fundamentals of Web Development
XML Overview XML in the web context - Used in many systems Randy Connolly and Ricardo Hoar Fundamentals of Web Development
Well Formed XML Sample Document XML declaration is analogous to HTML DOCTYPE Randy Connolly and Ricardo Hoar Fundamentals of Web Development
Well Formed XML Syntax Rules For a document to be well-formed XML, it must follow the syntax rules for XML: • Element names are composed of any of the valid characters (most punctuation symbols and spaces are not allowed) in XML. • Element names can’t start with a number. • There must be a single-root element. A root element is one that contains all the other elements; for instance, in an HTML document, the root element is <html>. • All elements must have a closing element (or be self-closing). • Elements must be properly nested. • Elements can contain attributes. • Attribute values must always be within quotes. • Element and attribute names are case sensitive. Randy Connolly and Ricardo Hoar Fundamentals of Web Development
Valid XML Requires a DTD • Validation is a process by which an XML document is validated. An XML document is said to be valid if its content matches with the elements, attributes and other piece of an associated document type declaration and if the document complies with the constraints expressed in it. • A valid XML document is one that is well formed and whose element and content conform to the rules of either its Document Type definition (DTD) or its schema. • A DTD tells the XML parser which elements and attributes to expect in the document as well as the order and nesting of those elements. • A DTD can be defined within an XML document or within an external file. Randy Connolly and Ricardo Hoar Fundamentals of Web Development
XSLT XML Stylesheet Transformations XSLT is an XML-based programming language that is used for transforming XML into other document formats Randy Connolly and Ricardo Hoar Fundamentals of Web Development
XSLT Another usage XSLT is also used on the server side and within Java. Script Randy Connolly and Ricardo Hoar Fundamentals of Web Development
XSLT Example XSLT document that converts the XML from Listing 17. 1 into an HTML list Randy Connolly and Ricardo Hoar Fundamentals of Web Development
XSLT An XML parser is still needed to perform the actual transformation Randy Connolly and Ricardo Hoar Fundamentals of Web Development
XPath Another XML Technology XPath is a standardized syntax for searching an XML document and for navigating to elements within the XML document XPath is typically used as part of the programmatic manipulation of an XML document in PHP and other languages XPath uses a syntax that is similar to the one used in most operating systems to access directories. Randy Connolly and Ricardo Hoar Fundamentals of Web Development
XPath Learn through example Randy Connolly and Ricardo Hoar Fundamentals of Web Development
XML Basics • XML tags identify the data and are used to store and organize the data, rather than specifying how to display it like HTML tags • XML Characteristics: • XML is extensible: XML allows you to create your own self-descriptive tags, or language, that suits your application. • XML carries the data, does not present it: XML allows you to store the data irrespective of how it will be presented. • XML is a public standard: XML was developed by World Wide Web Consortium (W 3 C) and is available as an open standard. Randy Connolly and Ricardo Hoar Fundamentals of Web Development
XML Usage list of XML usage XML can: • work behind the scene to simplify the creation of HTML for large web sites. • Be used to exchange the information between organizations and systems. • Be used for offloading and reloading of databases. • Be used to store and arrange the data. • Be easily be merged with style sheets to create almost any desired output. Randy Connolly and Ricardo Hoar Fundamentals of Web Development
What is a Markup? • XML is not a markup language, but the set of rules building a markup language. • XML is not a programming language, but it: • does have certain syntax • can be processed by special programs (parsers) Randy Connolly and Ricardo Hoar Fundamentals of Web Development
XML Syntax <? xml version="1. 0"? > <contact-info> Declaration Root Element xt Te <name lang="en">Tanmay Patil</name> <person> <company>Tutorials. Point</company> <phone>(011) 123 -4567</phone> Elements </person> <person> te ibu r t At <name lang="ar">Ahmed Ali</name> <company>Batelco</company> <phone>(00973)17448888</phone> </person> <!--- This is a comment ---> </contact-info> Randy Connolly and Ricardo Hoar Fundamentals of Web Development
XML Syntax Rules § § § XML Declaration Tags and Elements Attributes References Text Randy Connolly and Ricardo Hoar Fundamentals of Web Development
Declaration <? xml version="1. 0"? > • Optional but must be the first statement if used. • Case sensitive • Optional attributes include: • version: Always 1. 0 • encoding: Default is UTF-8 • standalone: Yes or No (Default is No) It informs the parser whether the document relies on information from an external source, such as external document type definition, for its content. Randy Connolly and Ricardo Hoar Fundamentals of Web Development
Tags Syntax Rules • Tag names are enclosed by triangular brackets <element>. . </element> or in simple-cases empty tags <element /> • Nesting of elements: can contain multiple XMLelements as its children, but the children elements must not overlap. XML tags must be closed in order i. e XML tag opened inside another element must be closed before the outer element is closed. • Only one root element. • Case sensitive: <contact-info> ≠ <Contact-Info> Randy Connolly and Ricardo Hoar Fundamentals of Web Development
Attributes • Attribute gives more information about the XML element or more precisely it defines a property of the element. • An XML attribute is always a name-value pair. • The syntax for attributes is name="value" • You can have multiple attributes for each element. • There are three types of attributes: • String. Type • Tokenized. Type • Enumrated. Type Randy Connolly and Ricardo Hoar Fundamentals of Web Development
Attributes Rules • An attribute name must not appear more than once in the same start-tag or empty-element tag. • The attribute must have been declared; the value must be of the type declared for it. • Attribute values must not contain direct or indirect entity references to external entities. • The replacement text of any entity referred to directly or indirectly in an attribute value must not contain either less than sign < or grater than > sign >. Randy Connolly and Ricardo Hoar Fundamentals of Web Development
Character Entities • Entities are placeholders in XML. • There are three types of character entities: 1. Predefined Character Entities: to avoid the ambiguity while using some of symbols 2. Numbered Character Entities: To refer the character entity numeric reference can be used. Numeric reference can either be in decimal or hexadecimal numbers. 3. Named Character Entities: for special character, such as A-acute (Á) Randy Connolly and Ricardo Hoar Fundamentals of Web Development
Character Data (CDATA) & White Space • CDATA are used to escape block of text that does not parsed by the parser and are otherwise recognized as markup xt <![CDATA[ CDATA Start d te a e r T as e T <message>Welcome to Tutorials. Point</message> ]] > CDATA End • Whitespace is handled in a significant or non significant manner in XML • Significant whitespace: within the element which contain text and markup mixed together • Non significant whitespace is the spaces where only element content is allowed. Randy Connolly and Ricardo Hoar Fundamentals of Web Development
Processing Instructions (PI) • Processing Instructions (PI) can be used to pass information to applications so as to escapes most XML rules. • PIs can appear anywhere in the document outside of other markup. • PIs are hardly used. They are known mostly to be used to link XML document to a stylesheet. <? xml-stylesheet href="tutorialspointstyle. css" type="text/css"? > Randy Connolly and Ricardo Hoar Fundamentals of Web Development
Encoding • Encoding is the process of converting characters into their equivalent binary representation. • When an XML processor reads an XML document, depending on the type of encoding it encodes the document. Hence we need to specify the type of encoding in the XML declaration. • There are mainly two types of encoding present: UTF-8 and UTF-16. UTF stands for UCS Transformation Format, and UCS itself means Universal Character Set. • UTF-8 is considered the default encoding when a declaration or the encoding attribute is missing. Randy Connolly and Ricardo Hoar Fundamentals of Web Development
Validation • Validation is a process by which an XML document is validated. • An XML document is said to be valid if its content matches with the elements, attributes and other piece of an associated document type declaration and if the document complies with the constraints expressed in it. • Validation is dealt in two ways by the XML parser: • Well-formed XML document • Valid XML document Randy Connolly and Ricardo Hoar Fundamentals of Web Development
Valid XML document • If an XML document is well-formed and has an associated Document Type Declaration [DTD] , then it is said to be a valid XML document. • The main drawback with DTDs is that they can only validate the existence and ordering of elements. They provide no way to validate the values of attributes or the textual content of elements. • For this type of validation, one must instead use XML schemas, which have the added advantage of using XML syntax. • Unfortunately, schemas have the corresponding disadvantage of being long-winded and harder for humans to read and comprehend; for this reason, they are typically created with tools. Randy Connolly and Ricardo Hoar Fundamentals of Web Development
Data Type Definition Example Randy Connolly and Ricardo Hoar Fundamentals of Web Development
XML Schema Just one example Randy Connolly and Ricardo Hoar Fundamentals of Web Development
XML Viewers and Editors • An XML document can be viewed using a simple text editor or any browser. • Most of the major browsers supports XML. • XML files are saved with a ". xml" extension. • XML Editor is a markup language editor. The XML documents can be edited or created using existing editors such as Notepad, Word. Pad or any simple text editor. • Most IDEs support XML editing and validation Randy Connolly and Ricardo Hoar Fundamentals of Web Development
XML Parser • XML parser is a software library or a package that provides methods for client applications to work with XML documents. It checks for proper format of the XML document and may also validate the XML • XML Parsers: • Verifies that an XML document is well formed. • Checks XML document for syntax errors • Converts XML document into some type of internal memory structure • All contemporary browsers have built-in parsers as do most web development environments such as PHP and ASP. NET Randy Connolly and Ricardo Hoar Fundamentals of Web Development
XML Processor • When a software program reads an XML document and does something with it, this is called processing the XML. • Therefore, any program that can read and that can process XML documents is known as an XML processor. • An XML processor reads an XML file and turns it into inmemory structures that the rest of the program can do whatever it likes with. • The most fundamental XML processor reads XML documents and converts them into an internal representation for other programs or subroutines to use. • This is called a parser, and it is an important component of every XML processing program. Randy Connolly and Ricardo Hoar Fundamentals of Web Development
- Slides: 32