1 Whitespace Handling Roger L Costello XML Technologies

  • Slides: 14
Download presentation
1 Whitespace Handling Roger L. Costello XML Technologies

1 Whitespace Handling Roger L. Costello XML Technologies

2 What is Whitespace? • The following characters are considered to be whitespace characters:

2 What is Whitespace? • The following characters are considered to be whitespace characters: – space (#x 020) – tab (#x 09) – newline (#x 0 A) – carriage return (#x 0 D)

3 Whitespace-only Nodes Identify all of the whitespace-only nodes in this XML document: <?

3 Whitespace-only Nodes Identify all of the whitespace-only nodes in this XML document: <? xml version="1. 0"? > <Whitespace. Test> <a> Text with surrounding whitespace </a> <b>Text with NO surrounding whitespace</b> <c>Text with embedded whitespace</c> <d> </d> <e> </e> <f xml: space="preserve"> </f> </Whitespace. Test> see whitespace-example 01

4 Easy to identify in tree form Document / Element Whitespace. Test PI <?

4 Easy to identify in tree form Document / Element Whitespace. Test PI <? xml version=“ 1. 0”? > Text "cr " Element a Text "cr " Text , , , " Note: cr = carriage return. Element b Text "Text …" Text "cr " Element c Text "Text …" Text "cr " Element d Text " " Text "cr " . . .

5 <xsl: apply-templates/> • This XSL instruction tells the XSL Processor to "go to

5 <xsl: apply-templates/> • This XSL instruction tells the XSL Processor to "go to each child node and execute the template rule for the node" – The child nodes include the text nodes of course! • Thus, if you write an identity transformation stylesheet then the output document will have the same indentation as the input document.

6 Strip whitespace-only nodes • Problem: create a stylesheet which does an identity transformation,

6 Strip whitespace-only nodes • Problem: create a stylesheet which does an identity transformation, but it strips all indentation. Input: Output: <? xml version="1. 0"? > <Whitespace. Test> <a> Text with surrounding whitespace </a> <b>Text with NO surrounding whitespace</b> <c>Text with embedded whitespace</c> <d> </d> <e> </e> <f xml: space="preserve"> </f> </Whitespace. Test> <? xml version="1. 0"? ><Whitespace. Test><a> Text with surrounding whitespace </a>. . .

7 <xsl: apply-templates select="*"/> This XSL instruction tells the XSL Processor to "go to

7 <xsl: apply-templates select="*"/> This XSL instruction tells the XSL Processor to "go to each child element node and execute the template for the node. " Thus, this will result in skipping the text nodes that are giving the indentation. Will this give us the desired output? No! Output: <? xml version="1. 0"? ><Whitespace. Test><a/><b/><c/><d/><e/><f/></Whitespace> In addition to stripping out the whitespace-only nodes (i. e. , the indentation nodes), we have also stripped out the data.

8 <xsl: strip-space elements="*"/> • This XSL instruction tells the XSL Processor to strip

8 <xsl: strip-space elements="*"/> • This XSL instruction tells the XSL Processor to strip out all whitespace-only nodes in the XML document, prior to any processing. • Thus, a stylesheet which does an identity transformation will yield an output document with no indentation. • But there's one small problem. . .

9 We have element nodes that contain whitespace-only! The value of <d> and <e>is

9 We have element nodes that contain whitespace-only! The value of <d> and <e>is whitespace-only! So their values will get removed also. We don't want these values removed. <? xml version="1. 0"? > <Whitespace. Test> <a> Text with surrounding whitespace </a> <b>Text with NO surrounding whitespace</b> <c>Text with embedded whitespace</c> <d> </d> <e> </e> <f xml: space="preserve"> </f> </Whitespace. Test>

10 <xsl: preserve-space elements="d e"/> • This XSL instruction has higher precedence than xsl:

10 <xsl: preserve-space elements="d e"/> • This XSL instruction has higher precedence than xsl: strip-space. It tells the XSL Processor "the space in elements <d> and <e> must be preserved".

11 xml: space="preserve" • This is a standard XML attribute that you can add

11 xml: space="preserve" • This is a standard XML attribute that you can add onto any XML element. This attribute instructs any application (e. g. , an XSL Processor) that processes the element "the space in this element is to be preserved". • This attribute has precedence over xsl: stripspace.

12 This stylesheet does an identity transformation, plus strips indentation <? xml version="1. 0"?

12 This stylesheet does an identity transformation, plus strips indentation <? xml version="1. 0"? > <xsl: stylesheet xmlns: xsl="http: //www. w 3. org/1999/XSL/Transform" version="1. 0"> <xsl: output method="xml"/> <xsl: strip-space elements="*"/> <xsl: preserve-space elements="d e"/> <xsl: template match="* | @*"> <xsl: copy> <xsl: apply-templates select="@*"/> <xsl: apply-templates/> </xsl: copy> </xsl: template> </xsl: stylesheet>

13 Notes about xsl: strip-space • <xsl: strip-space elements="*"/> indicates that all whitespace-only text

13 Notes about xsl: strip-space • <xsl: strip-space elements="*"/> indicates that all whitespace-only text nodes should be stripped. • <xsl: strip-space elements="a b c"/> indicates that the whitespace-only text nodes within <a>, <b>, and <c> are to be stripped.

14 Note: only immediate children are stripped Element Foo <xsl: strip-space elements="Foo"/> will result

14 Note: only immediate children are stripped Element Foo <xsl: strip-space elements="Foo"/> will result in stripping only these two nodes: That is, only the immediate children are stripped. Text "cr " Element bar Text "cr " Element gab Text "cr "