XML for Ecommerce Helena AhonenMyka University of Helsinki

  • Slides: 58
Download presentation
XML for E-commerce Helena Ahonen-Myka University of Helsinki

XML for E-commerce Helena Ahonen-Myka University of Helsinki

XML: background n n SGML: standard for markup languages (1986) HTML: an SGML application

XML: background n n SGML: standard for markup languages (1986) HTML: an SGML application XML: a simplified version of SGML (developed for the Web) software and platform independent representations for structured data

Example, HTML <html> <head> <title>An HTML document</title> </head> <body> <h 1>Heading 1</h 1> <p>Some

Example, HTML <html> <head> <title>An HTML document</title> </head> <body> <h 1>Heading 1</h 1> <p>Some text content</p> <h 2>Subheading</h 2> <p>More text. </p> </body> </html>

Lists, images, links <body> <h 1>Finland</h 1> <ol> <li><a href=”trav. html”>Traveling</a> <li><a href=”culture. html”>Culture</a>

Lists, images, links <body> <h 1>Finland</h 1> <ol> <li><a href=”trav. html”>Traveling</a> <li><a href=”culture. html”>Culture</a> <li><a href=”sports. html”>Sports</a> </ol> <p><img src=”map. jpg” alt=”map”> </body>

Tables <table border=” 1”> <tr><th>Year</th><th>Sales</th></tr> <tr><td>2000</td><td>$18 M</td></tr> <tr><td>2001</td><td>$25 M</td></tr> <tr><td>2002</td><td>$36 M</td></tr> </table> Year Sales

Tables <table border=” 1”> <tr><th>Year</th><th>Sales</th></tr> <tr><td>2000</td><td>$18 M</td></tr> <tr><td>2001</td><td>$25 M</td></tr> <tr><td>2002</td><td>$36 M</td></tr> </table> Year Sales 2000 $18 M 2001 $25 M 2002 $36 M

Forms <form action=”http: //some. com/add” method=”post”> <p>First name: <input type=”text” name=”fname”> Last name: <input

Forms <form action=”http: //some. com/add” method=”post”> <p>First name: <input type=”text” name=”fname”> Last name: <input type=”text” name=”lname”> <input type=”submit”><input type=”reset”> </p> </form> First name: _____________ Last name: _____________ Submit Reset

HTML n n n easy to describe simple documents (headings, text, lists, tables, images)

HTML n n n easy to describe simple documents (headings, text, lists, tables, images) easy to create links to other documents or different parts of the same document the elements have a default presentation style

Presentation n the browsers give elements a default presentation style often the authors want

Presentation n the browsers give elements a default presentation style often the authors want something else it is wise to separate the presentation from the document contents: ease of modifications and uniformity of the appearance

CSS: Cascading Style Sheets n n n Stylesheet defines for each element, e. g.

CSS: Cascading Style Sheets n n n Stylesheet defines for each element, e. g. , the font, size, color, widths of margins the structure of a document cannot be modified several stylesheets can be attached to a document: modularity

CSS, examples <style type=”text/css”> body { color: black; background: white; font-family: verdana, sans serif;

CSS, examples <style type=”text/css”> body { color: black; background: white; font-family: verdana, sans serif; } h 1, h 2 { color: red; } p. new { color: green; } </style>

CSS: layout <div class=”box”> The content within this DIV element will be enclosed in

CSS: layout <div class=”box”> The content within this DIV element will be enclosed in a box with a thin line around it. </div> div. box { border: solid; border-width: thin; width: 100%; padding: 2 em; }

CSS 2 n n free layout can be described for elements dynamic changes of

CSS 2 n n free layout can be described for elements dynamic changes of contents and style, animations etc.

Dynamic HTML n n HTML ECMAScript (Java. Script, JScript) CSS DOM

Dynamic HTML n n HTML ECMAScript (Java. Script, JScript) CSS DOM

Three-tier architecture n n n browser web server: processing logic database server

Three-tier architecture n n n browser web server: processing logic database server

Examples n n 1. Browser asks for a page. 2. Server sends the page.

Examples n n 1. Browser asks for a page. 2. Server sends the page. 3. Browser shows the page. 1. As above, but the page contains a form, which the user fills out. 2. Based on the data of the form, server starts an application which queries a database and forms a new page

Browser vs. server n n browser interprets CSS-definitions HTML documents may include embedded Java.

Browser vs. server n n browser interprets CSS-definitions HTML documents may include embedded Java. Script scripts, which are run in the browser problems: the implementations of CSS vary, Java. Script may be switched off most of the functionality on the server side?

XML n n Extensible Markup Language (1998) developed for interchanging structured documents in Internet

XML n n Extensible Markup Language (1998) developed for interchanging structured documents in Internet used more and more as a platform independent data format between applications document vs. data

”Document”: <memo importance=”high” date=” 19990323”> <from>Paul V. Biron</from> <to>Ashok Malhotra</to> <subject>Latest draft</subject> <body> We

”Document”: <memo importance=”high” date=” 19990323”> <from>Paul V. Biron</from> <to>Ashok Malhotra</to> <subject>Latest draft</subject> <body> We need to discuss the latest draft <emph>immediately</emph>. Either email me at <email> mailto: paul. v. biron@kp. org</email> or call <phone>555 -9876</phone> </body> </memo>

”Data”: <invoice> <order. Date>19990121</order. Date> <ship. Date>19990125</ship. Date> <billing. Address> <name>Ashok Malhotra</name> <street>123 IBM

”Data”: <invoice> <order. Date>19990121</order. Date> <ship. Date>19990125</ship. Date> <billing. Address> <name>Ashok Malhotra</name> <street>123 IBM Ave. </street> <city>Hawthorne</city> <state>NY</state> <zip>10532 -0000</zip> </billing. Address> <voice>555 -1234</voice> <fax>555 -4321</fax> </invoice>

<body> <p><b>Order date: </b> 19990121</p> <p><b>Shipping date: </b> 19990125</p> <p><b>Address: </b></p> <table> <tr><th>name<th>street<th>city<th>state<th>zip <tr><td>Ashok

<body> <p><b>Order date: </b> 19990121</p> <p><b>Shipping date: </b> 19990125</p> <p><b>Address: </b></p> <table> <tr><th>name<th>street<th>city<th>state<th>zip <tr><td>Ashok Malhotra <td>123 IBM Ave. <td>Hawthorne <td>NY <td>10532 -0000 </table> <p>Phone: 555 -1234</p> <p>Fax: 555 -4321</p> </body>

Basic concepts: logical structure n n logical structure: elements names of elements can be

Basic concepts: logical structure n n logical structure: elements names of elements can be chosen freely elements can have attributes logical structure is described by a document type definition (DTD)

Elements n n elements can be containers, which can contain other elements and/or text,

Elements n n elements can be containers, which can contain other elements and/or text, e. g. <name><fname>Helena</fname> <lname>Ahonen</lname></name> an element can also be empty: <img src=”picture. jpg” alt=”Picture” />

Attributes n n attributes express information that is not really content attribute/value pairs are

Attributes n n attributes express information that is not really content attribute/value pairs are attached to the start tag of an element <memo importance=”high”>…</memo> it may be difficult to decide whether some information should be modeled as an element or as an attribute

Attribute or element? <memo date=” 060600”> <from>Ashok Malhotra</from> <to>Peter May</to> … </memo> <from>Ashok Malhotra</from>

Attribute or element? <memo date=” 060600”> <from>Ashok Malhotra</from> <to>Peter May</to> … </memo> <from>Ashok Malhotra</from> <to>Peter May</to> <date>060600</date>. . . </memo>

Defining the structure: DTD n n n document type definition (DTD) describes how the

Defining the structure: DTD n n n document type definition (DTD) describes how the elements are formed from the other elements and text defines which attributes an element may/must have

Examples of definitions n n <!ELEMENT name (fname+, lname)> <!ELEMENT address (name, street, (city,

Examples of definitions n n <!ELEMENT name (fname+, lname)> <!ELEMENT address (name, street, (city, state, zipcode) | (zipcode, city))> <!ELEMENT contact (address, phone*, email? )> <!ELEMENT contact 2 (address | phone | email)*>

Symbols n n n + * ? | () , : : : 1

Symbols n n n + * ? | () , : : : 1 or more 0 or 1 choice (one has to be chosen) grouping order

DTD for the Invoice example <!DOCTYPE invoice [ <!ELEMENT invoice (order. Date, ship. Date,

DTD for the Invoice example <!DOCTYPE invoice [ <!ELEMENT invoice (order. Date, ship. Date, billing. Address voice*, fax? )> <!ELEMENT order. Date (#PCDATA)> <!ELEMENT ship. Date (#PCDATA)> <!ELEMENT billing. Address (name, street, city, state, zip)> <!ELEMENT voice (#PCDATA)> <!ELEMENT fax (#PCDATA)> <!ELEMENT name (#PCDATA)> <!ELEMENT street (#PCDATA)> <!ELEMENT city (#PCDATA)> <!ELEMENT state (#PCDATA)> <!ELEMENT zip (#PCDATA)>]>

Note: n n n elements cannot overlap container elements must have end tags empty

Note: n n n elements cannot overlap container elements must have end tags empty elements: all names are case-sensitive attribute values must be delimited by quotation marks

Well-formed XML documents n n documents that adhere to the formal requirements (syntax) of

Well-formed XML documents n n documents that adhere to the formal requirements (syntax) of the XML specification if a document is not well-formed, it is not an XML document (and the XML tools do not have to process it)

Valid documents n n n a document is a valid XML-document, if it is

Valid documents n n n a document is a valid XML-document, if it is well-formed and adheres to the structure defined in the DTD given XML-processor can be validating or non -validating sometimes validity is important, sometimes not

Where do the DTDs come from? n n n general DTDs: communities that have

Where do the DTDs come from? n n n general DTDs: communities that have to be able to interchange information agree on a common DTD also standard-like: Math. ML, SMIL tailored DTDs can be designed for the own use

XML basics: physical structure n n n physical structure: entities ”file structure”: a document

XML basics: physical structure n n n physical structure: entities ”file structure”: a document is assembled from parts: e. g. chapters of a book (each in one file) including parts that appear often non-XML content: e. g. images characters that are not found in the keyboard

Entities n n In DTD: <!ENTITY HY ”Helsingin yliopisto”> dokumentin sisällä: <place>&HY; </place>

Entities n n In DTD: <!ENTITY HY ”Helsingin yliopisto”> dokumentin sisällä: <place>&HY; </place>

Defining the presentation n n names of elements are arbitrary: the browsers cannot know

Defining the presentation n n names of elements are arbitrary: the browsers cannot know how an element should be presented presentation is defined using a separate stylesheet (CSS, XSL) one stylesheet - many documents one document - many stylesheets

Extensible Style Language (XSL) n n n specification contains two parts: transformation language XSLT

Extensible Style Language (XSL) n n n specification contains two parts: transformation language XSLT and formatting objects XSLT-transformation can express many kinds of transformations: elements can be inserted and deleted, elements can be reordered etc. standardization of formatting objects not ready

Transformation target n n n XSLT-transformations can be used for transformations into several different

Transformation target n n n XSLT-transformations can be used for transformations into several different representations since the standardization of general formatting objects is not ready, transforming XML into HTML is a good choice transformations into other XML-formats, PDF, etc. also possible

<sales> <products><product id=”p 1”>Packing Boxes</product> <product id=”p 2”>Packing Tape</product> </products> <record><cust num=”C 1001”> <prodsale

<sales> <products><product id=”p 1”>Packing Boxes</product> <product id=”p 2”>Packing Tape</product> </products> <record><cust num=”C 1001”> <prodsale idref=”p 1”>100</prodsale> <prodsale idref=”p 2”>200</prodsale> </cust> <cust num=”C 1002”> <prodsale idref=”p 2”>50</prodsale> </cust> <cust num=”C 1003”> <prodsale idref=”p 1”>75</prodsale> <prodsale idref=”p 2”>15</prodsale> </cust> </record> </sales>

<body> <h 2>Record of Sales</h 2> <ul> <li>C 1001 <li>C 1002 <li>C 1003 </ul>

<body> <h 2>Record of Sales</h 2> <ul> <li>C 1001 <li>C 1002 <li>C 1003 </ul> </body> Packing Boxes - 100</li> Packing Tape - 200</li> Packing Tape - 50</li> Packing Boxes - 75</li> Packing Tape - 15</li>

XSLT transformations n n n XML document is seen as a tree how do

XSLT transformations n n n XML document is seen as a tree how do we get from the source tree to the target tree? transformation rules are matched to the parts of the tree, and transformations defined by the rules are applied tree is often traversed starting from root contents can be picked from any part

<xsl: template match=”/”> <html><head><title>Record of Sales</title></head> <body><h 2>Record of Sales</h 2> <xsl: apply-templates select=”/sales/record”/>

<xsl: template match=”/”> <html><head><title>Record of Sales</title></head> <body><h 2>Record of Sales</h 2> <xsl: apply-templates select=”/sales/record”/> </body></html></xsl: template> <xsl: template match=”record”> <ul><xsl: apply-templates/></ul></xsl: template> <xsl: template match=”prodsale”> <li><xsl: value-of select=”. . /@num”/> <xsl: text> - </xsl: text> <xsl: value-of select=”id(@idref)”/> <xsl: text> - </xsl: text> <xsl: value-of select=”. ”/></li></xsl: template> </xsl: stylesheet>

Other XML related standards n n n XHTML Xlink XML Schema DOM RDF

Other XML related standards n n n XHTML Xlink XML Schema DOM RDF

XHTML n n n Extensible Hyper. Text Markup Language (v. 1. 0 January 2000)

XHTML n n n Extensible Hyper. Text Markup Language (v. 1. 0 January 2000) redefinition of HTML using XML XHTML documents can be processed using XML tools

XHTML: modularization n XHTML facilitates creating new document types: a subset can be used

XHTML: modularization n XHTML facilitates creating new document types: a subset can be used (e. g. for presentation on different devices) definitions can be expanded (special elements, e. g. for representation of medical information)

XLink n n n XML Linking Language (July 2000) links can have several targets

XLink n n n XML Linking Language (July 2000) links can have several targets types, roles, etc. can be attached to links can be stored separately from the document link can point to an arbitrary location in the target document behavior of the link can be defined

DOM n n n Document Object Model (Sep 2000) defines a platform- and languageneutral

DOM n n n Document Object Model (Sep 2000) defines a platform- and languageneutral programming interface (API) for HTML ja XML documents defines how programs and scripts can retrieve, insert, delete, and modify contents, structure and styles

XML Schema n n Sep 2000 the modeling power of DTD is restricted datatyping:

XML Schema n n Sep 2000 the modeling power of DTD is restricted datatyping: e. g. date, integer database schema-like representation: constraints e. g. how many times the element may occur

RDF n n n Resource Description Framework (Mar 2000) RDF can be used for

RDF n n n Resource Description Framework (Mar 2000) RDF can be used for describing metadata of web resources metadata for search engines, for managing large collections, for depicting the parts of a large document etc.

XML vs. HTML

XML vs. HTML

Good in HTML n n n well-known and broadly used: large public can use

Good in HTML n n n well-known and broadly used: large public can use easily browsers know how to show: it is not necessary to define the presentation separately heterogenous material is simple to combine using hyperlinks

Bad in HTML n n contents and presentation intermingle: multiple usages in different contexts

Bad in HTML n n contents and presentation intermingle: multiple usages in different contexts is difficult accessing parts of a document is hard representing complex structures is difficult automatization is difficult

Good in XML n n n contents in one place -> several presentations for

Good in XML n n n contents in one place -> several presentations for several media automatic processing of documents is easier: more precise queries, transformations, retrieving specific data structure of documents can be validated

Bad in XML n n meaning of elements have to be known presentation does

Bad in XML n n meaning of elements have to be known presentation does not exist automatically: stylesheets have to be given creating documents may require using special editors or laborious conversion browsers do not support well, yet

XML in system architectures n n basically like with HTML (three-tier) use of XML

XML in system architectures n n basically like with HTML (three-tier) use of XML is influenced by the nature of the contents (”data” or ”document”) ”data”: XML as an interchange format between applications (storage e. g. in relational databases) ”document”: content management systems (often based on object databases)

Browser vs. server n n n decision: where the final presentation is formed? If

Browser vs. server n n n decision: where the final presentation is formed? If the browser understands XSL, formatting can be given to the browser; otherwise the server transforms the document into HTML with CSS-styles probably always some transformation from the original XML format

Tools n n n editors: XML, XSL, DTD, XML Schema parsers (included in many

Tools n n n editors: XML, XSL, DTD, XML Schema parsers (included in many tools) XSL-engines content management systems (e. g. , managing document components, version managements, assembly) e-commerce tools

Technology providers n n Microsoft, IBM / Alpha. Works publishing technology providers (Arbortext, Soft.

Technology providers n n Microsoft, IBM / Alpha. Works publishing technology providers (Arbortext, Soft. Quad, Chrystal Software, Poet) database technology providers (Oracle, Sybase) public domain software, prototypes, etc. (e. g. Apache Cocoon -project)

XML portals n n n www. xml. com www. xml. org www. w 3

XML portals n n n www. xml. com www. xml. org www. w 3 c. org www. oasis-open. org www. xmlsoftware. com www. cs. helsinki. fi/~hahonen/uumek 00/ sisalto/xml/ (New Media course)