Interfacing XML and Erlang Ulf Wiger Senior Systems

  • Slides: 17
Download presentation
Interfacing XML and Erlang Ulf Wiger, Senior Systems Architect Network Architecture and Product Strategies

Interfacing XML and Erlang Ulf Wiger, Senior Systems Architect Network Architecture and Product Strategies Data Backbone and Optical Networks Division Ericsson Telecom AB 000922 ETXUWIG-99: 093 1

Executive Summary l Erlang/OTP is moving into vertical applications l XML is fast becoming

Executive Summary l Erlang/OTP is moving into vertical applications l XML is fast becoming an important standard l Erlang and XML fit very well together 000922 ETXUWIG-99: 093 2

The Reason for XMer. L l Interest in Erlang is growing l No longer

The Reason for XMer. L l Interest in Erlang is growing l No longer just for embedded systems l New interfaces must evolve – Powerful GUI components – Data exchange (COM, ODBC, XML, …) l XML is a logical addition to OTP – (ASN. 1, HTTP, IDL, CORBA, …) l Real reason: – I bought a book and became curious Number of Requests to www. erlang. org 000922 ETXUWIG-99: 093 3

What is XML? l “A Stricter HTML” l “A Simpler SGML” l Relatively Easy

What is XML? l “A Stricter HTML” l “A Simpler SGML” l Relatively Easy to Parse l Content Oriented l XML springs mostly from SGML – All non-essential SGML features have been removed – Web address support taken from HTML, Hy. Time and TEI – Some new functionality added l Modularity l Extensibility through powerful linking l International (Unicode) support l Data orientation 000922 ETXUWIG-99: 093 4

Where is XML used? l Large Web sites – HTML is generated via special

Where is XML used? l Large Web sites – HTML is generated via special (XSL) stylesheets – Internet Explorer has built-in support for XML l Document management – When machines must be able to read the documents l Machine-to-machine communication – XML RPC, SOAP – XML processors exist in many languages (even Erlang!) 000922 ETXUWIG-99: 093 5

A Simple XML Document <? xml version=“ 1. 0”? > <home. page title=“My Home

A Simple XML Document <? xml version=“ 1. 0”? > <home. page title=“My Home Page”> <title> Welcome to My Home Page </title> <text> <para> Sorry, this home page is still under construction. Please come back soon! </para> </text> </home. page> • All elements must have a start tag and an end tag (exception: <empty. tag/>) • An element can have a list of attributes Erlang analogy: {Tag, Attributes, Content} 000922 ETXUWIG-99: 093 6

A Simple Erlang-XML Document XML <? xml version=“ 1. 0”? > <home. page title=“My

A Simple Erlang-XML Document XML <? xml version=“ 1. 0”? > <home. page title=“My Home Page”> <title> Welcome to My Home Page </title> <text> <para> Sorry, this home page is still under construction. Please come back soon! </para> </text> </home. page> Erlang {‘home. page’, [{title, “My Home Page”}], [{title, “Welcome to My Home Page”}, {text, [{para, “Sorry, this home page is still under ” “construction. Please come back soon!”} ]} ]}. Almost equivalent 000922 ETXUWIG-99: 093 7

The Complete Picture l XML is more complex than that – – External DTDs

The Complete Picture l XML is more complex than that – – External DTDs Global namespace Language encoding Structural information should be optimized for queries l To parse XML properly, we use records l To output to XML (or similar), we may use the simple form Example record definition %% XML Element -record(xml. Element, { name, parents = [], pos, attributes = [], content = [], language = [], expanded_name = [], nsinfo = [], % {Prefix, Local} | [] namespace = #xml. Namespace{} }). 000922 ETXUWIG-99: 093 8

XMer. L Status l A fast XML processor produces an Erlang representation of the

XMer. L Status l A fast XML processor produces an Erlang representation of the XML document – Let’s call this representation a “complete form” l Erlang programs can use an XML-like representation – Let’s call this a “simple form” l An export tool can take either form and output almost anything l Plans to support XML Stylesheets (XSL, more on that later) l Basic support for XPATH (needed for XSL, Xlink, Xpointer, …) 000922 ETXUWIG-99: 093 9

The XMer. L Processor l Vsn 0. 6 is a single-pass scanner/parser implementing XML

The XMer. L Processor l Vsn 0. 6 is a single-pass scanner/parser implementing XML 1. 0 l Has been tested on thousands of XML documents – Appears to handle lots of different documents – Appears to be fast and flexible l There are two ways to process an XML document: – Tree-based parsing; the whole document at once – Event-based parsing; one element at a time l The XMer. L processor can do either – The behaviour is specified through higher-order functions (“funs”) – Validation can also be carried out in funs 000922 ETXUWIG-99: 093 10

The XMer. L Processor (2) l Proper handling of – – – Global namespace

The XMer. L Processor (2) l Proper handling of – – – Global namespace Entity expansion External and internal DTDs Conditional processing Uni. Code l Some support for infinite streams 000922 ETXUWIG-99: 093 11

The XMer. L Export Tool l The export tool takes a complete or simple

The XMer. L Export Tool l The export tool takes a complete or simple form and outputs some (almost arbitrary) data structure – Translation takes place in callback modules: CBModule: Tag(Content, Attributes, Parents, Complete. Record) – A callback module can inherit other callback modules – A callback function can do three things: l Return data on some output format l Point to another callback function (alias) l Return a modified (simple or complete) form for re-processing l Existing callback modules – HTML (not yet complete) – XML (generic, not complete) 000922 ETXUWIG-99: 093 12

Simple Export Tool Example foo() -> xmerl: export_simple(), xmerl_html, [{title, "Doc Title"}]). foo 2()

Simple Export Tool Example foo() -> xmerl: export_simple(), xmerl_html, [{title, "Doc Title"}]). foo 2() -> xmerl: export_simple(), xmerl_xml, [{title, "Doc Title"}]). simple() -> {document, [{title, "Doc Title"}, {author, “Ulf Wiger}], [ {section, [{heading, "heading 1"}], [{'P', "This is a paragraph of text. "}, {section, [{heading, "heading 2"}], [ {'P', "This is another paragraph. "}, {table, [{border, 1}], [{heading, [{col, "head 1"}, {col, "head 2"}]}, {row, [{col, "col 11"}, {col, "col 12"}]}, {row, [{col, "col 21"}, {col, "col 22"}]} ]} ]}. 000922 ETXUWIG-99: 093 13

Export to HTML Sample Code: foo() -> xmerl: export_simple(), xmerl_html, [{title, "Doc Title"}]). %%%

Export to HTML Sample Code: foo() -> xmerl: export_simple(), xmerl_html, [{title, "Doc Title"}]). %%% section/3 is to be used instead of headings. section(Data, Attrs, [{section, _}, {section, _} | _], E) -> foo 2() -> opt_heading(Attrs, "<h 4>", "</h 4>", Data); xmerl: export_simple(), xmerl_xml, [{title, "Doc Title"}]). section(Data, Attrs, [{section, _}, {section, _} | _], E) -> opt_heading(Attrs, "<h 3>", "</h 3>", Data); simple() -> section(Data, Attrs, [{section, _} | _], E) -> {document, [{title, "Doc Title"}, {author, “Ulf Wiger}], opt_heading(Attrs, "<h 2>", "</h 2>", Data); [ section(Data, Attrs, Parents, E) -> {section, [{heading, "heading 1"}], opt_heading(Attrs, "<h 1>", "</h 1>", Data). [{'P', "This is a paragraph of text. "}, {section, [{heading, "heading 2"}], opt_heading(Attrs, Start. Tag, End. Tag, Data) -> [ case find_attribute(heading, Attrs) of {'P', "This is another paragraph. "}, {value, Text} -> {table, [{border, 1}], [Start. Tag, Text, End. Tag, "n" | Data]; [{heading, false -> [{col, "head 1"}, Data {col, "head 2"}]}, end. {row, [{col, "col 11"}, {col, "col 12"}]}, {row, [{col, "col 21"}, {col, "col 22"}]} ]} ]}. 000922 ETXUWIG-99: 093 14

Export to XML foo() -> xmerl: export_simple(), xmerl_html, [{title, "Doc Title"}]). <? xml version="1.

Export to XML foo() -> xmerl: export_simple(), xmerl_html, [{title, "Doc Title"}]). <? xml version="1. 0"? > <document title="Doc Title" author="Ulf Wiger"> <section heading="heading 1"> <P> foo 2() -> This is a paragraph of text. xmerl: export_simple(), xmerl_xml, [{title, "Doc Title"}]). </P> <section heading="heading 2"> <P> simple() -> This is another paragraph. </P> {document, [{title, "Doc Title"}, {author, “Ulf Wiger}], <table border="1"> [ <heading> %% The '#root#' tag is called when the entire structure has<col> {section, [{heading, "heading 1"}], head 1 [{'P', "This been is a exported. paragraph of text. "}, </col> %% It does"heading 2"}], not appear in the structure itself. {section, [{heading, <col> '#root#'(Data, Attrs, [], E) -> head 2 [ </col> ["<? xml version="1. 0"? >n", Data]. {'P', "This is another paragraph. "}, </heading> {table, [{border, 1}], <row> <col> '#element#'(Tag, [], Attrs, Parents, E) -> [{heading, col 11 Tag. Str = mk_string(Tag), [{col, "head 1"}, </col> <col> ["<", tag_and_attrs(Tag. Str, Attrs), "/>n"]; {col, "head 2"}]}, col 12 {row, '#element#'(Tag, Data, Attrs, Parents, E) -> </col> Tag. Str = mk_string(Tag), </row> [{col, "col 11"}, <row> ["<", tag_and_attrs(Tag. Str, Attrs), ">n", {col, "col 12"}]}, <col> Data, opt_newline(Data), {row, col 21 </col> "</", Tag. Str, ">n"]. [{col, "col 21"}, <col> {col, "col 22"}]} col 22 </col> ]} </row> ]} </table> ]} </section> ]}. </document> Sample Code: 000922 ETXUWIG-99: 093 15

XML Stylesheets l Stylesheet support is clearly needed l Interpreting XML stylesheets is slow

XML Stylesheets l Stylesheet support is clearly needed l Interpreting XML stylesheets is slow and cumbersome (lots of independent, heavy XPATH queries) l Possible approach: – Read the stylesheets using the XMer. L processor – Translate them into an Erlang program – Optimization opportunity: convert xsl: match statements into match criteria for a single scan function l Lots more work is needed here. . . 000922 ETXUWIG-99: 093 16

More Examples. . . l Current xmerl version, 0. 6, is on Open Source

More Examples. . . l Current xmerl version, 0. 6, is on Open Source l Thanks to the beta testers: – Mickael Remond – Luc Taesch 000922 ETXUWIG-99: 093 17