Sistemi basati su conoscenza XML esempi Prof M

Sistemi basati su conoscenza XML (esempi) Prof. M. T. PAZIENZA a. a. 2002 -2003

Comments Comment : : = '<!--' ((Char - '-') | ('-' (Char - '-')))* '-->‘ An example of a comment: <!-- declarations for <head> & <body> --> Note that the grammar does not allow a comment ending in -->. The following example is not well-formed <!-- B+, B, or B--->

CDATA Sections CDATA sections may occur anywhere character data may occur; they are used to escape blocks of text containing characters which would otherwise be recognized as markup. CDATA sections begin with the string "<![CDATA[" and end with the string "]]>"

CDATA Sections CDSect : : = CDStart CData. CDEnd CDStart : : = '<![CDATA[‘ CData : : = (Char* - (Char* ']]>' Char*)) CDEnd : : = ']]>‘ Within a CDATA section, only the CDEnd string is recognized as markup, so that left angle brackets and ampersands may occur in their literal form; they need not (and cannot) be escaped using "< " and "& ". CDATA sections cannot nest.

CDATA Sections An example of a CDATA section, in which "<greeting>" and "</greeting>" are recognized as character data, not markup: <![CDATA[<greeting>Hello, world!</greeting>]]>

XML document (with DTD) An example of an XML document with a document type declaration <? xml version="1. 0"? > <!DOCTYPE greeting SYSTEM "hello. dtd"> <greeting>Hello, world!</greeting> The system identifier "hello. dtd" gives the address (a URI reference) of a DTD for the document

XML document (with DTD) The declarations can also be given locally, as in this example: <? xml version="1. 0" encoding="UTF-8" ? > <!DOCTYPE greeting [ <!ELEMENT greeting (#PCDATA)> ]> <greeting>Hello, world!</greeting>

XML document (with DTD) If both the external and internal subsets are used, the internal subset is considered to occur before the external subset. This has the effect that entity and attribute-list declarations in the internal subset take precedence over those in the external subset.

Language identification In document processing, it is often useful to identify the natural or formal language in which the content is written. A special attribute named xml: lang may be inserted in documents to specify the language used in the contents and attribute values of any element in an XML document. In valid documents, this attribute, like any other, must be declared if it is used.

Language identification A simple declaration for xml: lang might take the form xml: lang NMTOKEN #IMPLIED The intent declared with xml: lang is considered to apply to all attributes and content of the element where it is specified, unless overridden with an instance of xml: lang on another element within that content. Specific default values may also be given, if appropriate. In a collection of French poems for English students, with glosses and notes in English, the xml: lang attribute might be declared this way: <!ATTLIST poem xml: lang NMTOKEN 'fr'> <!ATTLIST gloss xml: lang NMTOKEN 'en'> <!ATTLIST note xml: lang NMTOKEN 'en'>

Language identification <p xml: lang="en">The quick brown fox jumps over the lazy dog. </p> <p xml: lang="en-GB">What colour is it? </p> <p xml: lang="en-US">What color is it? </p> <sp who="Faust" desc='leise' xml: lang="de"> <l>Habe nun, ach! Philosophie, </l> <l>Juristerei, und Medizin</l> <l>und leider auch Theologie</l> <l>durchaus studiert mit heißem Bemüh'n. </l> </sp>
- Slides: 11