Intro to XML Lecture overview What is XML

  • Slides: 69
Download presentation
Intro to XML • Lecture overview: • • • What is XML? Mark-up languages

Intro to XML • Lecture overview: • • • What is XML? Mark-up languages XML vs. HTML Style Sheets (CSS and XSL) Introduction to XML • See text Chapters 7 and 3 1

What is XML? • EXtensible Markup Language 4 http: //www. w 3. org/XML/ 4

What is XML? • EXtensible Markup Language 4 http: //www. w 3. org/XML/ 4 http: //en. wikipedia. org/wiki/XML • Metalanguage: language and tools for • creating new markup languages Designed to transport and store data 4 Showing data in formatted way is possible with XML, but not the purpose of the language 2

What is XML? • Consists of tags (just like HTML) 4 But now we

What is XML? • Consists of tags (just like HTML) 4 But now we define the tags ourselves 4 Thus, technically speaking documents that we claim to be “XML” are actually “documents in XML-generated languages 4 Thus, with XML, we can define markup elements based on a particular domain • Ex: Math. ML which is the Mathematical Markup Language – http: //www. w 3. org/Math/ 3

What is XML? • Ex: XHTML – a version of HTML generated • using

What is XML? • Ex: XHTML – a version of HTML generated • using XML Ex: RSS – formatting for "really simple syndication" • XML is an open standard and can be edited with a plain text editor 4 Being text-based enables it to be portable across platforms • Includes 4 syntax - the rules of the language 4 structure - organizing and storing information 4

Markup Languages • A set of rules that define the layout, • format, or

Markup Languages • A set of rules that define the layout, • format, or structure of text within a document Markup elements are added to the document, then processed by a program that can interpret the elements 4 See: http: //en. wikipedia. org/wiki/Markup_language 5

Markup Languages • “Marking up” has been used by • • typesetters for hundreds

Markup Languages • “Marking up” has been used by • • typesetters for hundreds of years “Markup Languages” were first proposed in the late 1960’s Ex: La. Te. X is a markup language with elements for describing the format of documents 4 Used a lot in Math and CS research papers 4 Based on Te. X, developed by Donald Knuth 4 Leslie Lamport added some features to Te. X (thus the La) 6

Markup Languages 4 See: http: //www. latex-project. org/ 4 Side note: Knuth and Lamport

Markup Languages 4 See: http: //www. latex-project. org/ 4 Side note: Knuth and Lamport are both very famous CS researchers • Text formatting was done on the side! • Ex: Microsoft Word provides an interface for “marking up” elements in a document such as bolding a string 4 In Word 2007+ it actually uses XML • http: //en. wikipedia. org/wiki/Office_Open_XML • Let’s look at an example: – word. XML. docx 7

Markup Languages • SGML - Standard Generalized Markup Language 4 Established a standard for

Markup Languages • SGML - Standard Generalized Markup Language 4 Established a standard for markup 4 International standard for large document projects 4 But extremely complex • Ex: parsing is difficult 4 http: //en. wikipedia. org/wiki/SGML • HTML is derived from SGML (mostly) 8

HTML • Advantages: 4 Fairly small, easy to learn language 4 It’s an open

HTML • Advantages: 4 Fairly small, easy to learn language 4 It’s an open standard, widely supported 4 Fast interpretation - fast web browsers 4 Portability vitally important to adoption as the standard for web markup • Disadvantages: 4 Limited capabilities, fixed specification 4 It’s not extensible for new domains 9

Motivation for XML • Motivation for XML: 4 HTML elements are primarily for defining

Motivation for XML • Motivation for XML: 4 HTML elements are primarily for defining presentation and formatting • Show the data in a browser • Allow interaction with the user 4 HTML does not provide semantic information about the data itself • What does the data mean? • How is the data on one page different / similar to data on another page? 10

XML • Official release 1. 0 in 1998 4 Fifth edition recommendation in November

XML • Official release 1. 0 in 1998 4 Fifth edition recommendation in November 2008 4 Also Version 1. 1 released in 2004 • Second edition in 2006 4 Version 1. 0 is still most widely used • Has simplicity of HTML and extensibility of SGML 4 It’s a subset of SGML 4 Easier to parse than SGML 11

XML Features • Allows data to be self-describing 4 Tag names allow information about

XML Features • Allows data to be self-describing 4 Tag names allow information about the content to be inferred 4 This has been debated • Some believe it is not a valid feature • Tags can be ambiguous • Meaning is to humans, not computers 4 Google "XML self-describing" 12

XML Features • Provides rules for XML elements to limit type of data in

XML Features • Provides rules for XML elements to limit type of data in an element 4 This can be done via Document Type Definitions or Schema • Allows custom data structures 4 Tags can be nested to form arbitrary tree configurations for data representation 13

XML Features • Can be used for data storage and interchange 4 As long

XML Features • Can be used for data storage and interchange 4 As long as the data specification is known, any party retrieving / receiving the data can parse it correctly • Separates the data from its format (presentation) 4 Allows different presentation styles for the same data 14

XML Features 4 Can create custom elements / tags • Tags can describe data

XML Features 4 Can create custom elements / tags • Tags can describe data <smart-phone-type> Android </smart-phone-type> 4 Elements do not map to formatting styles • Unlike HTML 4 Style sheets allow data to be formatted in different ways 15

Example: HTML <html> <head><title>Job Posting: Web master</title></head> <body> <h 1>Job Posting</h 1> <h 2>Job

Example: HTML <html> <head><title>Job Posting: Web master</title></head> <body> <h 1>Job Posting</h 1> <h 2>Job title: <i>Web master</i></h 2> <p><b>Job Description: </b> We are looking for a Web master to create and oversee our company’s web pages. </p> <p><b>Skills needed: </b> Basic writing skills, good communication, HTML. </p> </body> </html> 16

Example: HTML • In this example, the tags tell how the • data is

Example: HTML • In this example, the tags tell how the • data is to be formatted However, they tell us nothing about the type of information that is being presented 4 Just looking at the tags this could contain anything 4 We must try to infer that it is a job posting by reading the document 17

Example: XML <? xml version = “ 1. 0”? > <job-posting> <title> Job Title:

Example: XML <? xml version = “ 1. 0”? > <job-posting> <title> Job Title: <emphasis> Web Master </emphasis> </title> <description> We are looking for a Web master to create and oversee our company&apos; s web pages. </description> <skill-list> <skill> Basic writing skills </skill> <skill> Good communication skills </skill> <skill> Programming experience in web languages </skill> </skill-list> </job-posting> 18

Example: XML • In this example, the tags tell us information about the data

Example: XML • In this example, the tags tell us information about the data that is stored 4 By looking at the tags (without even seeing the values) we can infer a lot about the nature of the data 4 Even if a computer is "looking at the tags" we can still program specific behaviors to specific tags 19

Example XML • However, the tags tell us nothing about how the data will

Example XML • However, the tags tell us nothing about how the data will be formatted 4 In some cases we may not even care about this • Data may not need to be presented visually 4 If needed we can use style sheets for this 20

XML Data Hierarchy • Hierarchy of data in XML - defined by function and

XML Data Hierarchy • Hierarchy of data in XML - defined by function and relationship to other elements • Root: Element encompasses all other elements 4 In effect defines what the document is • Children: Elements in other elements • Parent: The containing element 21

Hierarchy job-posting title emphasis description skill-list skill 22

Hierarchy job-posting title emphasis description skill-list skill 22

Displaying XML files in browser • Relating XML document and style sheet: 4 We

Displaying XML files in browser • Relating XML document and style sheet: 4 We can use either a cascading style sheet (. css) or an XSLT style sheet (. xsl) 4 http: //www. w 3. org/Style/CSS/learning <? xml-stylesheet type = “text/css” href = “job. css"? > 23

Style Sheets • Cascading style sheets (CSS) • A means for presenting document 4

Style Sheets • Cascading style sheets (CSS) • A means for presenting document 4 We locate the style sheet in a file • Ex: job. css • Has rules and declarations to tell • browser how to display the document In the XML document add a line to show where the stylesheet is located <? xml-stylesheet type=“text/css” href=“job. css”? > 24

Style Sheets • Two parts in style sheet 4 Element selector 4 Property declarations

Style Sheets • Two parts in style sheet 4 Element selector 4 Property declarations Element (comma separated list) address { font-size: 12 pt; font-family: arial } Properties (property and value pairs separated by semicolons) 25

Selected Formatting Properties A wide variety of properties Property Description font Font properties font-family

Selected Formatting Properties A wide variety of properties Property Description font Font properties font-family Typeface font-size Size of font-style Style of font text-align Alignment of text-indent Indent first line color Text color and many, many more. . see: http: //www. w 3 schools. com/css Values font: italic small-caps bold 12 px arial font-family: arial font-size: small font-style: italic text-align: center text-indent: 10 (# pixels) color: red 26

Cascading Style Sheets to, from { font-weight: bold; text-align: left; border-style: solid } subject

Cascading Style Sheets to, from { font-weight: bold; text-align: left; border-style: solid } subject { text-decoration: underline; background-color: green; color: yellow } *{ color: green } To, from elements are bold, left aligned, with solid border Subject element is underlined, green background color (yuck), text is yellow Default properties to use: text color is green See job. css 27

CSS Inheritance • Hierarchy of elements in XML docs • Hierarchy is applied to

CSS Inheritance • Hierarchy of elements in XML docs • Hierarchy is applied to style sheet with property inheritance • Properties defined for parents are passed to child elements 4 E. g. , parent is 18 pt -> child is 18 pt unless property is redefined 28

Example <? xml version = “ 1. 0”? > <? xml-stylesheet type = “text/css”

Example <? xml version = “ 1. 0”? > <? xml-stylesheet type = “text/css” href = “job. css"? > <job-posting> <title> Job Title: <emphasis> Web Master </emphasis> </title> <description> We are looking for a Web master to create and oversee our company&apos; s web pages. </description> <skill-list> <skill> Basic writing skills </skill> <skill> Good oral skills </skill> <skill> Programming experience in web languages </skill> </skill-list> </job-posting> 29

Style sheet (. css) title { font-size: 28 pt; color: red; } emphasis {

Style sheet (. css) title { font-size: 28 pt; color: red; } emphasis { font-weight: bold; } description { display: block; margin-top: 15 px; font-size: 18 pt; } skill-list {background-color: yellow; color: green; } skill { display: block; margin-left: 30 px; margin-top: 5 px; font-size: 14 pt; font-family: 'Comic Sans MS'; } 30

CSS Inheritance 4 Consider job. xml and job. css example • <emphasis> tag is

CSS Inheritance 4 Consider job. xml and job. css example • <emphasis> tag is within both the <title> tag and the <skill> tag • In both cases it changes the font to bold, but does not affect any other formatting • The other properties are inherited from the parent tag 4 See also my home page and CS 1520 page 31

CSS with HTML • CSS can also be use effectively with HTML files 4

CSS with HTML • CSS can also be use effectively with HTML files 4 We can define style classes to be used with our documents 4 We can define style for given tags • Syntax for linking is different with HTML than XML 4 Use the <link> tag • See CDpoll-style. php and CDstyle. css 32

CSS Limitations • CSS is not a general way of expressing • presentation; it

CSS Limitations • CSS is not a general way of expressing • presentation; it provides a “static” formatting E. g. , we can’t make on-the-fly decisions about whether to include a header or footer, whether to color something green when it has two children, etc. 4 Formatting is based on the tags / attributes, not on the organization 33

CSS Limitations • We can get around these limitations using Javascript / DOM 4

CSS Limitations • We can get around these limitations using Javascript / DOM 4 Allows dynamic updating of the style through events 4 See CS 1520 Home page 4 See CDpoll-style. php and CDstyle. css • We can also use XSL for style 4 XSL is more flexible than CSS – we will briefly look at XSL 34

Displaying XML files in browser • Without the style sheet, the document will appear

Displaying XML files in browser • Without the style sheet, the document will appear with the tags (elements) intact 4 In Firefox or IE, at least • However, we can elide elements by clicking on the "–" that appears before them 4 Similar to opening subfolders in a folder hierarchy 4 See job-no-css. xml 35

XSL • XSL (e. Xtensible Stylesheet Language) is a very powerful combination of 3

XSL • XSL (e. Xtensible Stylesheet Language) is a very powerful combination of 3 different languages: 4 XSLT – XSL Transformation • XML language used to transform an XML document in various ways (perhaps into a different type of document) • Ex: Transform XML into HTML for display • Ex: Transform from one XML language into a different one 36

XSL 4 Xpath – XML Path language • Used to access / parse /

XSL 4 Xpath – XML Path language • Used to access / parse / traverse an XML document • Enables a user to query / access parts of the document tree in a regular way • Not itself an XML language 4 XSL-FO – XSL Formatting Objects • XML language designed to format / present documents – Ex: Can be used to generate PDFs from XML documents 37

XSL • Big Picture Example: 4 We have an XML document which we would

XSL • Big Picture Example: 4 We have an XML document which we would like to format for display in a browser 4 Perhaps we would like it to be displayed in an HTML table, or within some other HTML elements 4 We can use Xpath to select the XML elements that we want to present, and XSLT to transform them into HTML 38

XSL 4 If we want to add style to our newly generated HTML document

XSL 4 If we want to add style to our newly generated HTML document we can easily add some CSS as it is generated • Alternatively, we could transform the original document into XSL-FO and impart the style there – This is actually very powerful but in many cases using XSLT + CSS will be sufficient • Result: We still use CSS for style, but XSLT gives us much more flexibility than CSS alone 39

XSL • Ex: Consider a document that is storing XML emails 4 We would

XSL • Ex: Consider a document that is storing XML emails 4 We would like to display this document in a nicely formatted way in the browser 4 If we use CSS alone, we can add style to the XML elements, but that is basically it • We are seeing an XML document, with style • applied to the elements See emails 2. xml and emails 2. css 40

XSL 4 If we use XSLT + CSS, we can create a new HTML

XSL 4 If we use XSLT + CSS, we can create a new HTML document that includes HTML tags, our XML data AND CSS • Now we are seeing an HTML document which • has data from our XML document within it Note that we are not changing our original XML document – The HTML that we see is dynamically generated via XSLT and Xpath • See emails 2 -xsl. xml, emails 2. xsl and emails 2 -xsl. css 41

XML Syntax • Components of XML documents 4 Declaration - says it’s an XML

XML Syntax • Components of XML documents 4 Declaration - says it’s an XML document 4 Elements - describe data in document • Attributes - info. clarifying element 4 Entities - placeholders for content 4 Comments - useful notes & documentation • These components can be specified and regulated using DTDs (Document Type Definitions) or Schema 42

XML Declaration • Indicates document is an XML document <? xml version=“ 1. 0”?

XML Declaration • Indicates document is an XML document <? xml version=“ 1. 0”? > <? xml version=“ 1. 0” encoding=“UTF-8”? > <? xml version=“ 1. 0” encoding=“UTF-16”? > Encoding attribute deals with the character set that will be used Ex: if non-ASCII characters will be used 43

XML Elements • Core components of XML document • Consist of 4 Start tag:

XML Elements • Core components of XML document • Consist of 4 Start tag: <element> 4 Content: data or other elements or both 4 End tag: </element> • Elements are like English nouns – definable objects 44

Element Examples Start tag <book> Here is Edward Bear, coming downstairs now, bump, bump

Element Examples Start tag <book> Here is Edward Bear, coming downstairs now, bump, bump on the back of his head, behind Christopher Robin. It is, as far as he knows, the only way of coming downstairs, but sometimes he feels that there really is another way, if only he could stop bumping for a moment and think of it. … Content </book> End tag 45

Element Examples <email_message> Dear CS 1520: Web programming is sure fun! </email_message> <plane> F

Element Examples <email_message> Dear CS 1520: Web programming is sure fun! </email_message> <plane> F 117 Nighthawk </plane> 46

Root element • All XML documents contain 4 Outermost element – root element 4

Root element • All XML documents contain 4 Outermost element – root element 4 All other elements and data within document further describe root <book> <title> Programming the World Wide Web </title> <author> Robert Sebesta </author> <publisher> Addison-Wesley </publisher> </book> 47

Elements are containers • Elements contain 4 Elements and contents (data) 4 Elements can

Elements are containers • Elements contain 4 Elements and contents (data) 4 Elements can nest within each other 4 May contain child elements e. g. , <title> contained within <book> • Empty elements 4 In html, <p> 4 In XML (or XHTML) <br/> <p/> 48

XML Attributes • Information that describes elements 4 Similar to an adjective - adding

XML Attributes • Information that describes elements 4 Similar to an adjective - adding more to the definition • Defined in the start tag of elements • Attributes are name-value pairs 4 Value must be in quotes 4 Same idea as with HTML attributes 49

XML Attributes • Examples: <movie source="http: //www. starwars. com/the-force-awakens/">Star Wars The Force Awaken</movie> <band

XML Attributes • Examples: <movie source="http: //www. starwars. com/the-force-awakens/">Star Wars The Force Awaken</movie> <band genre=“Post-punk”>Joy Division</band> • We can either use elements or attributes to modify tags – up to programmer and situation 4 See p. 281 -282 of Sebesta (8 th edition) 4 One approach is to use attributes only for items that are not content-related • Ex: an id for an element 50

XML Attributes <band name = “Joy Division” genre = “Post-Punk”></band> Vs <band> <name>Joy Division</name>

XML Attributes <band name = “Joy Division” genre = “Post-Punk”></band> Vs <band> <name>Joy Division</name> <genre>Post-Punk</genre> </band> • XML purists prefer second approach (minimal attributes) • Generally more flexible / extensible • But a bit more wordy as well 51

XML Entities • Entities are placeholders for content • Can contain different types of

XML Entities • Entities are placeholders for content • Can contain different types of data, including: 4 text, 4 special characters, 4 XML markup and 4 binary data 52

XML Entities • 2 types: General and Parameter 4 Parameter entities can only be

XML Entities • 2 types: General and Parameter 4 Parameter entities can only be referenced in Document Type Definitions (DTDs) • We will not cover DTDs in detail but you can look them up – See: http: //www. xml. com/pub/a/98/10/guide 0. html? page=3 • General entities 4 Character – used for special characters 4 Content- mark content that is used often 4 Unparsed – used for binary or other nontext data 53

Character Entities • Certain characters are reserved 4 E. g. , “<“, “>”, etc.

Character Entities • Certain characters are reserved 4 E. g. , “<“, “>”, etc. • Character entities are entity names with predefined values Character ‘ “ > < Name apos quot gt lt Entity &apos; " > < 54

Character Entities • Example: <? xml version=“ 1. 0”? > <equation> 50 < 100</equation>

Character Entities • Example: <? xml version=“ 1. 0”? > <equation> 50 < 100</equation> • There also numbered character entities: 4' is the ‘ character, b/c decimal 39 in UTF-8 is the single quote 4 See: http: //en. wikipedia. org/wiki/UTF-8 55

Content Entities • We can define entities and then use them in document –

Content Entities • We can define entities and then use them in document – like a function • Example: <!ENTITY address “ 123 Main Street”> • To use this entity, in document use &address; at the place or places where you want the address 56

Example of Content entities <? xml version = “ 1. 0”? > <!DOCTYPE business

Example of Content entities <? xml version = “ 1. 0”? > <!DOCTYPE business [ <!ENTITY name “ACME Shipping Company” > <!ENTITY address “ 123 Main Street” > <!ENTITY city “Boston” > <!ENTITY state “MA” > ]> <business_name > Business Name: &name; </business_name> <business_address> Business Address: &address; </business_address> </business> 57

External Entities • The entities we have looked at so far are internal entities

External Entities • The entities we have looked at so far are internal entities 4 Refer to content in the same document • Content entities can be created to refer to external content from files and URLs <!ENTITY description SYSTEM “description. xml”> In the file description. xml: <business_description>Description: ACME Shipping supplies widgets and gadgets all over the world. </business_description> 58

External Entities 4&description; can be used any place in the document to refer to

External Entities 4&description; can be used any place in the document to refer to the description by way of the file (taken from the same URL as the original document) 4 Also can have an absolute URL <!ENTITY description SYSTEM “http: //www. acme. com/description. xml”> refers to the file on the acme. com site 59

XML Comments • Comments and documentation of your documents is always desirable! • Comments

XML Comments • Comments and documentation of your documents is always desirable! • Comments same as HTML: <!---- hey! i made a comment!! ----> 60

Well-Formed XML • Document must adhere to syntax rules: 4 All documents must contain

Well-Formed XML • Document must adhere to syntax rules: 4 All documents must contain one and only one root element 4 Elements must have a start and end tag • E. g. , <article>. . . </article> • Except: <image/> is same as <image></image> 61

Well-Formed XML 4 Elements must be nested properly and can not overlap; each element

Well-Formed XML 4 Elements must be nested properly and can not overlap; each element must be contained completely inside its parent • Right: <book><chapter>. . . </chapter></book> • Wrong: <book><chapter>. . </book></chapter> 4 Attributes must have a value and must be enclosed in quotes • <greeting value=“Hello!”> 62

Well-Formed XML 4 Attributes must be placed in start tag and a particular attribute

Well-Formed XML 4 Attributes must be placed in start tag and a particular attribute may only appear once in the start tag 4 Element names are case-SENSITIVE 4 Element names start with letters or an underscore; can have letters, numbers, hyphens, periods, & underscores 63

Well-Formed XML • We can use Document Type Definitions (DTDs) or Schemas to impose

Well-Formed XML • We can use Document Type Definitions (DTDs) or Schemas to impose structural rules on our XML 4 Can list the elements, attributes and entities 4 Can also impose some rules on frequency and proper use 4 We will just discuss these superficially but you can read the text for more information on Schemas 64

Why Document Models? • A non-validating parser lets us check that the correct syntax

Why Document Models? • A non-validating parser lets us check that the correct syntax is followed 4 Checks to make sure that the document is well-formed XML • Is there a closing tag for each opening tag? 4 Does not check what elements can be in the document, or how they must appear • Ex: What if an element should only appear within another element? – Perhaps a <skill> must appear within a <skill-list> > To appear anywhere else is an error 65

Why Document Models? • Ex: What if an element should not be in the

Why Document Models? • Ex: What if an element should not be in the • document at all (for example, a typo)? Ex: What if a nested element can appear at most one time? • Thus we may also want to impose restrictions to enforce a particular structure in our document 66

Document Model • Document model defines: 4 Vocabulary for a markup language 4 Grammar

Document Model • Document model defines: 4 Vocabulary for a markup language 4 Grammar rules for a markup language • Documents that are both well-formed and obey both the vocabulary and grammar are considered to be valid 4 Validating parser: Checks both syntax and structure 67

Example Capabilities • With document model: 4 Define elements document can contain 4 Define

Example Capabilities • With document model: 4 Define elements document can contain 4 Define order that elements appear in 4 Require that certain elements appear 4 Define number of elements allowed 4 Define type of data in an element 4 Define child elements for an element 4 Define attributes of elements 4 Assign constraints to attribute values 68

Document Model • In some sense, it’s defining a “protocol” or “rules” for constructing

Document Model • In some sense, it’s defining a “protocol” or “rules” for constructing and understanding documents • It is not required, but when trying to establish a document sharing standard, should be included 4 So documents can be parsed in a consistent way 69