Interfacing XML and Erlang Ulf Wiger Senior Systems












![Simple Export Tool Example foo() -> xmerl: export_simple(), xmerl_html, [{title, "Doc Title"}]). foo 2() Simple Export Tool Example foo() -> xmerl: export_simple(), xmerl_html, [{title, "Doc Title"}]). foo 2()](https://slidetodoc.com/presentation_image_h2/ebe9cedb4259193e6d4a80ea397c5d0a/image-13.jpg)
![Export to HTML Sample Code: foo() -> xmerl: export_simple(), xmerl_html, [{title, "Doc Title"}]). %%% Export to HTML Sample Code: foo() -> xmerl: export_simple(), xmerl_html, [{title, "Doc Title"}]). %%%](https://slidetodoc.com/presentation_image_h2/ebe9cedb4259193e6d4a80ea397c5d0a/image-14.jpg)
![Export to XML foo() -> xmerl: export_simple(), xmerl_html, [{title, "Doc Title"}]). <? xml version="1. Export to XML foo() -> xmerl: export_simple(), xmerl_html, [{title, "Doc Title"}]). <? xml version="1.](https://slidetodoc.com/presentation_image_h2/ebe9cedb4259193e6d4a80ea397c5d0a/image-15.jpg)


- Slides: 17
Interfacing XML and Erlang Ulf Wiger, Senior Systems Architect Network Architecture and Product Strategies Data Backbone and Optical Networks Division Ericsson Telecom AB 000922 ETXUWIG-99: 093 1
Executive Summary l Erlang/OTP is moving into vertical applications l XML is fast becoming an important standard l Erlang and XML fit very well together 000922 ETXUWIG-99: 093 2
The Reason for XMer. L l Interest in Erlang is growing l No longer just for embedded systems l New interfaces must evolve – Powerful GUI components – Data exchange (COM, ODBC, XML, …) l XML is a logical addition to OTP – (ASN. 1, HTTP, IDL, CORBA, …) l Real reason: – I bought a book and became curious Number of Requests to www. erlang. org 000922 ETXUWIG-99: 093 3
What is XML? l “A Stricter HTML” l “A Simpler SGML” l Relatively Easy to Parse l Content Oriented l XML springs mostly from SGML – All non-essential SGML features have been removed – Web address support taken from HTML, Hy. Time and TEI – Some new functionality added l Modularity l Extensibility through powerful linking l International (Unicode) support l Data orientation 000922 ETXUWIG-99: 093 4
Where is XML used? l Large Web sites – HTML is generated via special (XSL) stylesheets – Internet Explorer has built-in support for XML l Document management – When machines must be able to read the documents l Machine-to-machine communication – XML RPC, SOAP – XML processors exist in many languages (even Erlang!) 000922 ETXUWIG-99: 093 5
A Simple XML Document <? xml version=“ 1. 0”? > <home. page title=“My Home Page”> <title> Welcome to My Home Page </title> <text> <para> Sorry, this home page is still under construction. Please come back soon! </para> </text> </home. page> • All elements must have a start tag and an end tag (exception: <empty. tag/>) • An element can have a list of attributes Erlang analogy: {Tag, Attributes, Content} 000922 ETXUWIG-99: 093 6
A Simple Erlang-XML Document XML <? xml version=“ 1. 0”? > <home. page title=“My Home Page”> <title> Welcome to My Home Page </title> <text> <para> Sorry, this home page is still under construction. Please come back soon! </para> </text> </home. page> Erlang {‘home. page’, [{title, “My Home Page”}], [{title, “Welcome to My Home Page”}, {text, [{para, “Sorry, this home page is still under ” “construction. Please come back soon!”} ]} ]}. Almost equivalent 000922 ETXUWIG-99: 093 7
The Complete Picture l XML is more complex than that – – External DTDs Global namespace Language encoding Structural information should be optimized for queries l To parse XML properly, we use records l To output to XML (or similar), we may use the simple form Example record definition %% XML Element -record(xml. Element, { name, parents = [], pos, attributes = [], content = [], language = [], expanded_name = [], nsinfo = [], % {Prefix, Local} | [] namespace = #xml. Namespace{} }). 000922 ETXUWIG-99: 093 8
XMer. L Status l A fast XML processor produces an Erlang representation of the XML document – Let’s call this representation a “complete form” l Erlang programs can use an XML-like representation – Let’s call this a “simple form” l An export tool can take either form and output almost anything l Plans to support XML Stylesheets (XSL, more on that later) l Basic support for XPATH (needed for XSL, Xlink, Xpointer, …) 000922 ETXUWIG-99: 093 9
The XMer. L Processor l Vsn 0. 6 is a single-pass scanner/parser implementing XML 1. 0 l Has been tested on thousands of XML documents – Appears to handle lots of different documents – Appears to be fast and flexible l There are two ways to process an XML document: – Tree-based parsing; the whole document at once – Event-based parsing; one element at a time l The XMer. L processor can do either – The behaviour is specified through higher-order functions (“funs”) – Validation can also be carried out in funs 000922 ETXUWIG-99: 093 10
The XMer. L Processor (2) l Proper handling of – – – Global namespace Entity expansion External and internal DTDs Conditional processing Uni. Code l Some support for infinite streams 000922 ETXUWIG-99: 093 11
The XMer. L Export Tool l The export tool takes a complete or simple form and outputs some (almost arbitrary) data structure – Translation takes place in callback modules: CBModule: Tag(Content, Attributes, Parents, Complete. Record) – A callback module can inherit other callback modules – A callback function can do three things: l Return data on some output format l Point to another callback function (alias) l Return a modified (simple or complete) form for re-processing l Existing callback modules – HTML (not yet complete) – XML (generic, not complete) 000922 ETXUWIG-99: 093 12
Simple Export Tool Example foo() -> xmerl: export_simple(), xmerl_html, [{title, "Doc Title"}]). foo 2() -> xmerl: export_simple(), xmerl_xml, [{title, "Doc Title"}]). simple() -> {document, [{title, "Doc Title"}, {author, “Ulf Wiger}], [ {section, [{heading, "heading 1"}], [{'P', "This is a paragraph of text. "}, {section, [{heading, "heading 2"}], [ {'P', "This is another paragraph. "}, {table, [{border, 1}], [{heading, [{col, "head 1"}, {col, "head 2"}]}, {row, [{col, "col 11"}, {col, "col 12"}]}, {row, [{col, "col 21"}, {col, "col 22"}]} ]} ]}. 000922 ETXUWIG-99: 093 13
Export to HTML Sample Code: foo() -> xmerl: export_simple(), xmerl_html, [{title, "Doc Title"}]). %%% section/3 is to be used instead of headings. section(Data, Attrs, [{section, _}, {section, _} | _], E) -> foo 2() -> opt_heading(Attrs, "<h 4>", "</h 4>", Data); xmerl: export_simple(), xmerl_xml, [{title, "Doc Title"}]). section(Data, Attrs, [{section, _}, {section, _} | _], E) -> opt_heading(Attrs, "<h 3>", "</h 3>", Data); simple() -> section(Data, Attrs, [{section, _} | _], E) -> {document, [{title, "Doc Title"}, {author, “Ulf Wiger}], opt_heading(Attrs, "<h 2>", "</h 2>", Data); [ section(Data, Attrs, Parents, E) -> {section, [{heading, "heading 1"}], opt_heading(Attrs, "<h 1>", "</h 1>", Data). [{'P', "This is a paragraph of text. "}, {section, [{heading, "heading 2"}], opt_heading(Attrs, Start. Tag, End. Tag, Data) -> [ case find_attribute(heading, Attrs) of {'P', "This is another paragraph. "}, {value, Text} -> {table, [{border, 1}], [Start. Tag, Text, End. Tag, "n" | Data]; [{heading, false -> [{col, "head 1"}, Data {col, "head 2"}]}, end. {row, [{col, "col 11"}, {col, "col 12"}]}, {row, [{col, "col 21"}, {col, "col 22"}]} ]} ]}. 000922 ETXUWIG-99: 093 14
Export to XML foo() -> xmerl: export_simple(), xmerl_html, [{title, "Doc Title"}]). <? xml version="1. 0"? > <document title="Doc Title" author="Ulf Wiger"> <section heading="heading 1"> <P> foo 2() -> This is a paragraph of text. xmerl: export_simple(), xmerl_xml, [{title, "Doc Title"}]). </P> <section heading="heading 2"> <P> simple() -> This is another paragraph. </P> {document, [{title, "Doc Title"}, {author, “Ulf Wiger}], <table border="1"> [ <heading> %% The '#root#' tag is called when the entire structure has<col> {section, [{heading, "heading 1"}], head 1 [{'P', "This been is a exported. paragraph of text. "}, </col> %% It does"heading 2"}], not appear in the structure itself. {section, [{heading, <col> '#root#'(Data, Attrs, [], E) -> head 2 [ </col> ["<? xml version="1. 0"? >n", Data]. {'P', "This is another paragraph. "}, </heading> {table, [{border, 1}], <row> <col> '#element#'(Tag, [], Attrs, Parents, E) -> [{heading, col 11 Tag. Str = mk_string(Tag), [{col, "head 1"}, </col> <col> ["<", tag_and_attrs(Tag. Str, Attrs), "/>n"]; {col, "head 2"}]}, col 12 {row, '#element#'(Tag, Data, Attrs, Parents, E) -> </col> Tag. Str = mk_string(Tag), </row> [{col, "col 11"}, <row> ["<", tag_and_attrs(Tag. Str, Attrs), ">n", {col, "col 12"}]}, <col> Data, opt_newline(Data), {row, col 21 </col> "</", Tag. Str, ">n"]. [{col, "col 21"}, <col> {col, "col 22"}]} col 22 </col> ]} </row> ]} </table> ]} </section> ]}. </document> Sample Code: 000922 ETXUWIG-99: 093 15
XML Stylesheets l Stylesheet support is clearly needed l Interpreting XML stylesheets is slow and cumbersome (lots of independent, heavy XPATH queries) l Possible approach: – Read the stylesheets using the XMer. L processor – Translate them into an Erlang program – Optimization opportunity: convert xsl: match statements into match criteria for a single scan function l Lots more work is needed here. . . 000922 ETXUWIG-99: 093 16
More Examples. . . l Current xmerl version, 0. 6, is on Open Source l Thanks to the beta testers: – Mickael Remond – Luc Taesch 000922 ETXUWIG-99: 093 17