Introduction to XML Valrie Bellynck Grenoble INP Pagora
Introduction to XML Valérie Bellynck Grenoble INP – Pagora � mailto: Valerie. Bellynck@grenoble-inp. fr
What is XML ? From "XML in Micro-Application", e-Poche collection • means : e. Xtensible Markup Language (in French « langage à balises extensible » , or « langage à balises extensibles » ; in spanish ? ) • 1996 : clarification by the XML Working Group, under World Wide Web Consortium (W 3 C) supervision • XML ~ generalisation of HTML where fixed semantic predefined tags � author « invented » own tags • 1998 : official evolution to standard XML 1. 0 specifications � recommandations � http: //www. w 3 c. org/XML/
HTML ? XML ? SGML XML comes from SGML, not from HTML From XML in Micro-Application e-Poche collection
SGML Standard Generalized Markup Language • defined in 1986 by ISO 8879 standard • dissociates completely in a document : content / presentation / structure description • used in - industry for technical documents - electronic document management (GED) • problems : - does not aimed at Internet use - complex and heavy description to follow � http: //www. sgmlsource. com/Goldfarb/history/index/htm
HTML Hyper. Text Markup Language • is an extension of SGML • is a language of document description section titles, bookmarks, anchors, linguistic elements to format text, to describe tables. . . • is interpreted by a browser (a client application for Internet requests) • the display is browser-independent • problems : - content and presentation are mixed � http: //www. w 3 c. org/HTML/
Targets to XML : it must be. . . • • • • used without difficulty in Internet defined quickly described in a formal and concise way auto-describing able to extent its-self deal with an arborecent data description treatable with any application equiped with a text parser able to support UNICODE and any other police codage for linguistic universality support a large panel of applications compatible with SGML make easier writing software aimed to document processing a way of representing data as human-readable documents easy to use for creating documents
Markup Languages ? • Markups are pairs of expressions (tags) which surround a block of text, to indicate some characteristics ex : in HTML, the tag <B> commands beginning of bold display and </B> commands its end <B> Text in Bold </B> � Text in Bold • Tags can be parametrised by attributes ex : in HTML, - the tag <a> allows to define a hypertext link - the URL of the link is defined by the attribute href - the clickable text is surrounded by the tags <a> and </a> <a href="http: //www. 3 ie. org/xml"> click here </a> � click here
HTML code <HTML> <HEAD> <TITLE>Lime Jello Marshmallow Cottage Cheese Surprise</TITLE> </HEAD> <BODY> <H 3>Lime Jello Marshmallow Cottage Cheese Surprise</H 3> My grandma's favorite (may she rest in peace). <H 4>Ingredients</H 4> <TABLE BORDER="1"> <TR BGCOLOR="#308030"> <TH>Qty</TH><TH>Units</TH><TH>Item</TH> </TR><TR> <TD>1</TD><TD>box</TD><TD>lime gelatin</TD> </TR><TR> <TD>500</TD><TD>g</TD><TD>multicolored tiny marshmallows</TD> </TR><TR> <TD>500</TD><TD>ml</TD><TD>cottage cheese</TD> </TR><TR> <TD></TD><TD>dash</TD><TD>Tabasco sauce (optional)</TD> </TR> </TABLE> <P> <H 4>Instructions</H 4> <OL> <LI>Prepare lime gelatin according to package instructions. . . </LI> <!-- and so on --> </BODY> </HTML>
HTML example code in browser
XML example code <? xml version="1. 0"? > <Recipe> <Name>Lime Jello Marshmallow Cottage Cheese Surprise</Name> <Description>My grandma's favorite (may she rest in peace). </Description> <Ingredients> <Ingredient> <Qty unit="box">1</Qty> <Item>lime gelatin</Item> </Ingredient> <Ingredient> <Qty unit="g">500</Qty> <Item>multicolored tiny marshmallows</Item> </Ingredient> <Ingredient> <Qty unit="ml">500</Qty> <Item>Cottage cheese</Item> </Ingredient> <Ingredient> <Qty unit="dash"/> <Item optional="1">Tabasco sauce</Item> </Ingredients> <Instructions> <Step>Prepare lime gelatin according to package instructions</Step> <!-- And so on. . . --> </Instructions> </Recipe>
XML heading informations Every XML file should begin with a header defining which version of XML is used in the document <? xml version="1. 0"? > This is done through the version attribute. Other attributes can define global properties, such as : - encoding attribute, which defines the character encoding <? xml version="1. 0" encoding="ISO-8859 -1"? > The encoding specific to French characters is ISO-8859 -1 The international universal encoding for all characters is UTF-8
Well-formed XML means « parsable » • A well-formed XML document is a document that follows all the notational and structural rules for XML, otherwise it is meaningless By analogy, the expression 2 ( + + 5 (=) 9 > 7 is meaningless even if it looks (sort of) like math • The most important rules are : – No unclosed tags : a block can’t be "opened" with a tag <TAG> without being "closed" afterwards with </TAG> – Use of closed empty elements : they must have either a closing tag <EMPTY type="example"> </EMPTY> or a single tag with slash " /" before the closing " >" : <EMPTY type="example" /> – No overlapping tags : a tag that opens inside another tag must close before the containing tag closes : <INCLUDING-TAG> <CONTAINING-TAG> </CONTAINING-TAG> </INCLUDING-TAG> – Enclosing quotes for attribute values : <TAG type="example">
Valid XML A document is valid because it matches its Document Type Definition (DTD) • A DTD is a grammar for some class of documents using a markup language, that is, a set of rules to describe the authorized sequences and embeddings of tags • The language to write DTDs is a special language, not XML but there is a more complex syntax to define DTs in XML (schemas) • A DTD specifies – what elements may exist, – which attributes the elements may have, – what structural organisation of elements is attempted : what element may or must be found inside other elements, and in what order. � due to DTD, XML is e. Xtensible
Power of DTD • Wrinting a DTD is how you actually define a new markup language -- often called a dialect of XML. • At present, DTDs are being written for an enormous number of different problem domains, and each DTD defines a new markup language. • New markup languages now exist, or are being designed, – to mark up specific domains such as the plays of Shakespeare or business data in the footwear industry (FDX). . . – to define general data resources (RDF); – to model information in the health care industry (HL 7 SGML/XML); – to typeset, display, and actively use mathematical equations (Math. ML); – and to perform electronic data interchange (XML/EDI).
Modelling information structure in XML
DTD for the example <!-- This is the example DTD for the example XML --> <!ELEMENT Recipe (Name, Description? , Ingredients? , Instructions? )> <!ELEMENT Name (#PCDATA)> <!ELEMENT Description (#PCDATA)> <!ELEMENT Ingredients (Ingredient)*> <!ELEMENT Ingredient (Qty, Item)> <!ELEMENT Qty (#PCDATA)> <!ATTLIST Qty unit CDATA #REQUIRED> <!ELEMENT Item (#PCDATA)> <!ATTLIST Item optional CDATA "0" is. Vegetarian CDATA "true"> <!ELEMENT Instructions (Step)+>
DTD : defining tags <!ELEMENT Recipe (Name, Description? , Ingredients? , Instructions? )> The <!ELEMENT. . . > statement defines a tag in the document. This tag defines a <Recipe> tag, stating that it can contain - a <Name> , - an optional <Description> (the question mark [? ] denotes optionality), - an optional <Ingredients> tag, - and an optional <Instructions> tag. <!ELEMENT Name (#PCDATA)> This simply states that a <Name> tag can contain character data and nothing else. <!ATTLIST Item optional CDATA "0" is. Vegetarian CDATA "true"> This section states that the <Item> tag has two possible attributes: - optional , whose default value is 0; and - is. Vegetarian , whose default value is true. <!--- This is a comment --> the text « This is a comment » won’t be interpreted.
DTD : other definitions <!ENTITY Utterance "example of sentence or value"> This defines an internal entity. It associates a value to a name which will be more explicit than a tag in the document. . The browser will replace the entity &Utterance; by the text : example of sentence or value There are external entities too which can either be some XML content or not, which and are all defined in XML language. <!ENTITY Text. Presentation SYSTEM "http: //foo. com/presentation/text. xml"> It allows the document to reference the content of the file saved in the URL. The browser will replace the entity &Text. Presentation; by the content of the file placed at http: //foo. com/presentation/text. xml <!NOTATION gif SYSTEM "usr/local/bin/display"> <!ENTITY Image. Presentation SYSTEM "http: //foo. com/img/lion. gif" NDATA gif> For not XML content, as gif files, for example, the notation definition allows to specify the authorized application <image. Pres src= "Image. Presentation"> which will include the image in the document through the browser
DTD file call in XML file in the XML file, a document type declaration tells the parser -to start looking for a <Recipe> tag as the top-level tag (root) of the document. -that the DTD is in the system file personne. dtd <!DOCTYPE Recipe SYSTEM "example. dtd"> <? xml version="1. 0" encoding="ISO-8859 -1" ? > <!DOCTYPE personne SYSTEM "personne. dtd"> <personne> <prenom>Alain</prenom> <nom>Connu</nom> </personne>
DTD directly included in file <!DOCTYPE personne [direct. DTDcontent]> <? xml version="1. 0" encoding="ISO-8859 -1" ? > <!--DTD declaration and definition --> <!DOCTYPE personne [ <!ELEMENT personne (prenom, nom)> <!ELEMENT prenom (#PCDATA)> <!ELEMENT nom (#PCDATA)> ]> <!--end of DTD declaration and definition --> <personne> <prenom>Alain</prenom> <nom>Connu</nom> </personne>
What is a « Name. Space » ? • It allows to share tags between XML-authors of documents • It allows to choose between own-defined tags and someone-else-defined tags • It concerns DTD : used for elements and for attributes • Some Names. Space can become a W 3 C norm : - XMLSchema (e. Xtensible Markup Language Schema) - Xlink (e. Xtensible link) - XSL (e. Xtensible Stylesheet Language) - XHTML - versions of HTML (3. 0, 4. 0. . . )
Example of HTML Namespace <? XML version="1. 0"? > <!--Every elements are in HTML Namespace--> <html: html xmlns: html= "http: //www. w 3. org/TR/REC-html 40"> <html: head> <html: title>Namespace Example use</html: title> </html: head> <html: body> <html: p> Text and Links <html: a href= "http: //foo. com">here</html: a> </html: p> </html: body> </html: html> This example uses the XML name space of HTML defined in the W 3 C recommendations REC-html 40 for HTML version 4. 0
Example of using 2 Namespaces <? XML version="1. 0"? > <ls: livre xmlns: lv= "unr: loc. gov: livres" xmlns: isbn= "unr: ISBN: 0 -395 -36341 -6"> <lv: titre>Harry Potter et la coupe de feu</lv: titre> <isbn: number>0747554420</isnb: number> </ls: livre> This example commands the browser to load 2 namespaces using respectively lv and isbn as prefixes
Case of schema structure representation in XML Schema • is an XML based alternative to DTD • has support for Data types (more than only PCDATA) • use XML syntax (=> editable with an XML editor, parseable by any XML parser, manipulate with the XML DOM, transformable with XSLT) • is extendible just like XML (=> reusability, derivability for own data types from standard types , multiple schema referenciation from the same document) • secure data communication (sender and receiver can both have same « expectation » about the content by sharing its structural representation : link to interoperability) � http: //www. w 3 schools. com/default. asp/
Exemple de schéma <? XML version="1. 0" encoding="iso-8859 -1" ? > <xsd: schema xmlns: xsd= "http: //www. w 3. org/2001/XMLSchema" element. Form. Default="qualified" > <xsd: element name="film" type="type. Film" /> <xsd: complex. Type name="type. Film" > <xsd: sequence > <xsd: element name="titre" type="xsd: string" /> <xsd: element name="acteurs" type="type. Acteur" /> <xsd: element name="realisateur" type="xsd: string" /> <xsd: element name="annee" type="xsd: decimal" /> <xsd: element name="texte" type="xsd: string" /> <xsd: element name="note" type="xsd: string" min. Occurs="0" max. Occurs="1" /> </xsd: sequence > </xsd: complex. Type > <xsd: complex. Type name="type. Acteur" > <xsd: sequence > <xsd: element name="personne" type="xsd: string" min. Occurs="0" max. Occurs="unbounded" /> </xsd: sequence > </xsd: complex. Type > </xsd: schema>
Presentation : CSS and XSL for general control over formatting, use • Cascading Style Sheet • e. Xtensible Stylesheet Language Both are declarative languages XSL is more recent than CSS XSL is described in XML, using namespace power
CSS for HTML and XML • exists as a current recommendation from the W 3 C, usable with HTML or XML • Is simpler to use and less powerful than XSL • is supported by most current-generation browsers (to varying degrees) � http: //www. W 3. org/TR/html 401/present/styles
Cascading Style Sheets In the small example next, <HTML> contains <BODY> contains <H 1> contains text : <HTML> <HEAD> </HEAD> <BODY> <H 1>A Theory About the Brontosaurus</H 1> My theory about the brontosaurus is. . . </BODY> </HTML> The whole idea of a style sheet is to use these structural relationships to indicate where changes in text style, spacing, and so on should occur. <STYLE TYPE="text/css"> <!-H 1 { color: red; font-size: 16 pt; text-decoration: underline } --> </STYLE>
Cascading Style Sheets In the small example next, <group> contains <person> contains <name> contains text : <group> <person> <name>bibi</name> <address> <name>Les oiseaux</name> </address> </person> </group> For XML files, the browser can’t use any knowledge about how to display elements. Same tags can be used in different elements. => use these structural relationships in selectors to indicate which element has to reflect some kind of display. group>person>name { display: block; color: red; border: solid 2 px green; }
Example of CSS file html: body { background-color: rgb(255, 230) } article { display: block; font-family: helvetica, sans-serif; background-color: rgb(230, 255) } titre { display: block; font-size: 200%; text-align: center; border-width: medium; border-style: groove } auteur { display: block; font-size: 80%; font-weight: bold } date { display: inline; font-size: 80%; font-style: italic } lieu { display: inline; font-size: 80%; font-weight: bold } texte { display: block } grand { display: inline; font-variant: small-caps; font-size: 120%; font-weight: bold } image { display: block; border-width: thin; text-align: center; border-style: solid; content: url(attr(site)); } legende { display: block; text-align: center; padding-right: 2 mm; padding-top: 2 mm; padding-bottom: 2 mm; padding-left: 2 mm }
External CSS The CSS to use can be defined - using <LINK> element (in the <HEAD> for default use) <HTML> <HEAD> <LINK href="special. css" rel="stylesheet" type="text/css"> </HEAD> <BODY> <H 1>A Theory About the Brontosaurus</H 1> My theory about the brontosaurus is. . . </BODY> </HTML> - in the <META> declaration (only for default use). . . <HEAD> <META http-equiv="Content-Style-Type" content="text/css"> </HEAD>. . .
External CSS In XML files, the CSS to use can be define using <? xmlstylesheet …? > declaration (just after <? xml …? >) <? xml version="1. 0" encoding="ISO-8859 -1"? > <? xml-stylesheet type="text/css" href="style. css" ? > <group> <person> <name>bibi</name> <address> <name>Les oiseaux</name> </address> </person> </group> This css "style. css" file can be : group>person>name { display: block; color: red; border: solid 2 px green; } address name { display: in-line; }
How do browsers apply CSS ? The browser will determine which style to use as follows 1. select the last CSS <META> declaration 2. otherwise, select the last other CSS declaration (for example, by <LINK> ) 3. otherwise, the default stylesheet language is "text/css"
Why CSS is named CSS ? • These style sheets are called cascading style sheets, because styles (like fonts, colors, and so on) for one markup element "cascade" down, and apply to all of the element's contents. • For example, if a paragraph tag (<P>) is set to show its text in red, all text and any other element inside that paragraph will be displayed in red, unless one sub-element of the paragraph specifies a color for its contents.
XSL for XML and SGML • used exclusively to format XML or SGML • more complex and powerful than CSS � http: //nwalsh. com/docs/tutorials/webtek 2000/xsl/ie/frames. html
XSL : Why Stylesheets for XML ? From Norman Walsh http: //nwalsh. com/docs/tutorials/webtek 2000/xsl/ie/frames. html because : • XML is not a fixed tag set (like HTML) and has no (application) semantics • XML markup does not (usually) include formatting information • Reuse: the same content can look different in different contexts • Multiple output formats: different media (paper, online), different sizes (manuals, reports), different classes of output devices (workstations, hand-held devices) • Styles tailored to the reader's preference (e. g. , accessibility): print size, color, simplified layout for audio readers
Options for displaying XML
What does a Style. Sheet do ? It specifies the presentation of XML information using two basic categories of techniques: • An optional transformation of the input document into another structure – – – generation of constant text suppression of content moving text (e. g. , exchanging the order of the first and last name) duplicating text (e. g. , copying titles to make a table of contents) executing more complex transformations that "compute" new information in terms of the existing information • A description of how to present the transformed information – i. e. , a specification of what properties to associate to each of the various parts of the transformed information
Needs to present information Description of how to present the (possibly transformed) data includes three levels of formatting information: 1. Specification of the general screen or page (or even audio) layout 2. Assignment of the transformed content into basic "content container types" (e. g. , lists, paragraphs, inline text) 3. Specification of formatting properties (spacing, margins, alignment, fonts, etc. ) for each resulting "container"
Components of XSL The full XSL language logically consists of three component languages which are described in three W 3 C (World Wide Web Consortium) recommendations: • XPath: XML Path Language a language for referencing specific parts of an XML document • XSLT: XSL Transformations a language for describing how to transform one XML document (represented as a tree) into another • XSL: Extensible Stylesheet Language XSLT plus a description of a set of Formatting Objects and Formatting Properties
XML to Result Tree An XSLT "stylesheet" transforms the input (source) document tree into a structure called a result tree consisting of result objects Transform to Another Vocabulary
What is an XSL Stylesheet ? • XSLT Stylesheets are XML documents; namespaces are used to identify semantically significant elements. • Most stylesheets are stand-alone documents rooted at <xsl: stylesheet> (or <xsl: transform>). It is possible to have "single template" stylesheet/documents. Note that it is the mapping from namespace abbreviation to URI that is important, not the literal namespace abbreviation "xsl: " that is used most commonly
Understanding a template Most templates have the following form: <xsl: template match=" para "> <p> <xsl: apply-templates/> </p> </xsl: template> • The whole <xsl: template> element is a template • The match pattern determines where this template applies • Literal result elements come from non-XSL namespace(s) • XSLT elements come from the XSL namespace
Style sheet example A small, complete style sheet: <xsl: stylesheet xmlns: xsl="http: //www. w 3. org/1999/XSL/Transform" version="1. 0"> <xsl: output method="html"/> <xsl: template match="doc"> <html> <head><title><xsl: value-of select="title"/></head> <body><xsl: apply-templates/></body> </html> </xsl: template> <xsl: template match="title"> <h 1><xsl: apply-templates/></h 1> </xsl: template> <xsl: template match="para"> <p><xsl: apply-templates/></p> </xsl: template> </xsl: stylesheet>
Transformation is application of templates Templates transform portions of the source tree into portions of the result tree. The ordered accumulation of all the transformed portions forms the complete result tree. Individual templates are free to process elements from anywhere in the source tree.
Match Patterns (locating elements) critical capability of a stylesheet language : locate source elements to be styled For example, - CSS, does this with "selectors". - FOSIs do it with "e-i-c's", elements in context. - XSLT does it with "match patterns" defined in XPath.
XPath has an extensible string-based syntax inspired, in part, by the common "path/file" file system syntax: para matches all <para> children in the current context para/emphasis matches all <emphasis> elements that have a parent of <para> ancestor-or-self: : */@sepchar matches the sepchar attribute on the current element or any ancestor of the current element numberedlist/listitem[position() mod 2 = 0] matches odd list items in a numbered list.
Applying style recursively The process is allowed to run recursively, driven primarily by the document. A series of templates is created, such that if there is a template to match each context, then these templates are recursively applied starting at the root of the document. • <xsl: template match=". . . "> • <xsl: apply-templates> <xsl: template match="section/title"> <h 2><xsl: applytemplates/></h 2> </xsl: template> <xsl: apply-templates select="th|td"/> 2 obstacles appear when using the recursive model, – how to arbitrate between multiple patterns that match and – how to process the same nodes in different contexts. These are solved by conflict resolution and modes, respectively.
Applying style proceduraly This process for applying style, is to select each action procedurally. A series of templates is created, such that each template explicitly selects and processes the necessary elements. <xsl: for-each> <xsl: template name=". . . "> <xsl: for-each select="row"> <tr> <xsl: for-each select="entry"> <td><xsl: value-of select=". "/></td> </xsl: for-each> </tr> </xsl: for-each> <xsl: template name="admonition"> <xsl: param name="type">warning</xsl: param> . . . </xsl: template> <xsl: call-template name="admonition"> <xsl: with-param name="type">caution</xsl: with-param> </xsl: call-template>
Conditional processing Simple conditional (no "else") <xsl: if> <xsl: if test="{$somecondition}"> <xsl: text>this text only gets used if $somecondition is true()</xsl: text> </xsl: if> Select among alternatives with <xsl: when> and <xsl: otherwise> <xsl: choose> <xsl: when test="$count > 2"> <xsl: text>, and </xsl: text> </xsl: when> <xsl: when test="$count > 1"> <xsl: text> and </xsl: text> </xsl: when> <xsl: otherwise> <xsl: text> </xsl: text> </xsl: otherwise> </xsl: choose>
Variables can be used to save computed values. • • • Variables are created with <xsl: variable>. Variables are "single assignment" (no side effects) Variables are lexically scoped Once created, variables can be used to generate content: <a href="{$file}">. . . </a> And control conditional processing: <xsl: if test="$count = 3">. . . </xsl: if> >
Creating the resulting tree Literal Result Elements Any element in a template rule that is not in the XSL (or other extension) namespace is copied literally to the result tree <p>. . . </p> XSL Elements in the XSL namespace: <xsl: text > <xsl: value-of > <xsl: element > <xsl: attribut >. . .
Numbering and sorting You can • • • Count source tree elements (chapters, list-items, stock quotes, etc. ) Convert between number formats (1, B, iii, . . . ) Sort elements for presentation
Overall XSL formatting capabilities XSL FO formatting capabilities in XSL 1. 0 are approximately the union of: • HTML + CSS capabilities • most high quality print output capabilities including internationalization features Not included are complex page layouts (e. g. , magazine and newspaper layout), complex layout-driven formatting (e. g. , copy fitting and complex floats), and loose leaf pagination (change page production)
Formatting objects and properties • XSL = XSLT + vocabulary of FOs and properties • XSL defines a powerful set of formatting objects • XSL uses (and extends) a set of Common Formatting Properties developed jointly with the CSS&FP (Cascading Style Sheet and Formatting Property) Working Group • When a result tree uses this standardized set of formatting objects and properties, then an XSL-compliant formatter can process that result tree to produce the specified output
Formatting object basics Inline versus block objects Common formatting properties, harmonized with CSS
Common formatting objects • page-sequence--a major part (such as front or body) in which the basic page layout may differ from other parts • flow--a chapter- or section-like division within a page-sequence • block--a paragraph (or title or block quote, etc. ) • inline--e. g. , a font change within a paragraph • wrapper--a "transparent" object usable as either a block or an inline object that has no effect other than to provide a place to hang inheritable properties • list FOs--list-block, list-item-label, list-item-body • graphic--references an external graphic object • table FOs--mostly analogous to the standard (CALS, OASIS, HTML) table models
Basic properties • • font properties margin and spacing properties border and padding properties keeps/breaks horizontal alignment/justification indentation more formatting object specific properties
Some application domains (1) • HR-XML (Human Resources XML) is a standard suite of XML specifications to enable e-business and the automation of human resources-related data exchanges • XHTML (e. Xtensible HTML) is a standard designed to help the transition from HTML to XML. It makes it possible to use XML processing tools, in particular to modify presentation depending on the target device (PDA, cellular. . . ) • SVG (Scalable Vector Graphics) allows to describe 2 -dimensional graphics in XML. Its standardization is supported by Adobe, Microsoft, & others • SMIL (Synchronized Multimedia Integration Language) is a standard suite of XML specifications to enable e-business and the automation of human resources-related data exchanges
Some application domains (2) • Math. ML (Mathematical Markup Language) is a language for normalized scientific content. It allows to represent complex mathematical expressions for displaying them on Internet • DHTML (Dynamic HTML) is a kind of self-contained thing-unto-itself to create HTML that can change even after a page has been loaded into a browser • PPML (Printnamic Dynamic Markup Language) is an XML-based language for variable-data printing. It was developed by the Digital Printing Initiative (PODi) • 3 DML, Human. ML, Artificial Intelligence ML. . .
In short, XML is. . . …a powerful tool for • data representation, • storage, • modelling, • and interoperation
Small XML example code <? xml version="1. 0" encoding="ISO-8859 -1"? > <article> <titre> Un journaliste accuse, un policier dément </titre> <auteur> Alain Connu </auteur> <date> 14 juin 1972 </date> <lieu> banquise </lieu> <texte> Un journaliste de la place accuse les autorités. . . </texte> </article>
Petite introduction à XML • Un document XML est bien formé s’il respecte certaines contraintes : – toutes les balises ayant un contenu non vide doivent être fermées – les balises n'ayant pas de contenu doivent se terminer par /> – les valeurs d'attributs doivent être entre guillemets • Un document XML est valide par rapport à une DTD s'il respecte les règles exprimées par la DTD – DTD : ensemble de règles indiquant quelles sont les séquences et imbrications de balises autorisées <!ELEMENT UL (LI)+> <!ELEMENT LI (PCDATA | u | it | b)*>
Modelling information structure in XML
Small introduction to Markup Languages (XML, HTML) From the course of Bertrand Ibrahim, Geneva University • XML : allows to structure the information • XML : allows to automatize the processing of structured documents and formatted data • XML ~ a generalization of HTML where, instead of using a set of predefined tags with predefined meanings, authors can "invent" their own tags
- Slides: 66