XML Processing in William Narmontas Dino Fancellu www

  • Slides: 49
Download presentation
XML Processing in William Narmontas Dino Fancellu www. scala. contractors XML LONDON 2014

XML Processing in William Narmontas Dino Fancellu www. scala. contractors XML LONDON 2014

Dino Fancellu 35 years IT Scala • Java • XML William Narmontas 10 years

Dino Fancellu 35 years IT Scala • Java • XML William Narmontas 10 years IT Scala • XML • Web

What is Scala?

What is Scala?

Scala processes XML fast

Scala processes XML fast

It is powerful

It is powerful

Modular Concise Functional Type-safe Performant Object-oriented Strongly-typed Composable Java-interoperable Statically-typed Unopinionated First-class XML

Modular Concise Functional Type-safe Performant Object-oriented Strongly-typed Composable Java-interoperable Statically-typed Unopinionated First-class XML

Who uses Scala? Apple e. Bay Linked. In The Guardian Bank of America e.

Who uses Scala? Apple e. Bay Linked. In The Guardian Bank of America e. Harmony Morgan Stanley Tom Barclays EDF Netflix Trafigura BBC Four. Square Novell Tumblr BSky. B Gawker Rackspace Twitter Cisco HSBC Sky UBS Citigroup ITV Sony VMware Credit Suisse Klout Springer Xerox

Projects in Scala - Less code to write = less to maintain - Communication

Projects in Scala - Less code to write = less to maintain - Communication clearer - Testing easier - Software robust - Time to market: fast - Happier developers

Scala language: Intro

Scala language: Intro

Values Scala val conference. Name = "XML London 2014" XQuery let $conference. Name :

Values Scala val conference. Name = "XML London 2014" XQuery let $conference. Name : = "XML London 2014" Scala (Mutable) var conference. Name = "XML London 2014" conference. Name = "XML London 2015"

Strings val language = "Scala" s"XML Processing in $language" | XML Processing in Scala

Strings val language = "Scala" s"XML Processing in $language" | XML Processing in Scala s"""An introduction to: |The "$language" programming language""". strip. Margin | An introduction to: | The "Scala" programming language s"$language has ${language. length} chars in its name" | Scala has 5 chars in its name

Functions Scala def fun(x: Int, y: Double) = s"$x: $y" XQuery declare function local:

Functions Scala def fun(x: Int, y: Double) = s"$x: $y" XQuery declare function local: fun( $x as xs: integer, $y as xs: double ) as xs: string { concat($x, ": ", $y) };

Everything is an expression val train. Speed = if ( train. speed. mph >=

Everything is an expression val train. Speed = if ( train. speed. mph >= 60 ) "Fast" else "Slow" def divide(numerator: Int, denominator: Int) = try { s"${numerator/denominator}" } catch { case _: java. lang. Arithmetic. Exception => s"Cannot divide $numerator by $denominator" }

Types: Explicit def with. Title(name: String, title: String): String = s"$title. $name" val x:

Types: Explicit def with. Title(name: String, title: String): String = s"$title. $name" val x: Int = { val y = 1000 100 + y } | x: Int = 1100

Functions: named parameters Further clarity in method calls: def make. Link(url: String, text: String)

Functions: named parameters Further clarity in method calls: def make. Link(url: String, text: String) = s"""<a href="$url">$text</a>""" make. Link(text = "XML London 2014", url = "http: //www. xmllondon. com") | <a href="http: //www. xmllondon. com">XML London 2014</a>

Functions: default parameters Reduce repetition in method calls: def with. Title(name: String, title: String

Functions: default parameters Reduce repetition in method calls: def with. Title(name: String, title: String = "Mr") = s"$title. $name" with. Title("John Smith") | Mr. John Smith with. Title("Mary Smith", "Miss") | Miss. Mary Smith

Functional def incremented. By. One(x: Int) = x + 1 (1 to 5). map(incremented.

Functional def incremented. By. One(x: Int) = x + 1 (1 to 5). map(incremented. By. One) | Vector(2, 3, 4, 5, 6)

Lambdas (1 to 5). map(x => x + 1) | Vector(2, 3, 4, 5,

Lambdas (1 to 5). map(x => x + 1) | Vector(2, 3, 4, 5, 6) (1 to 5). map(_ + 1) | Vector(2, 3, 4, 5, 6)

For comprehensions for { x <- (1 to 5) } yield x + 1

For comprehensions for { x <- (1 to 5) } yield x + 1 | Vector(2, 3, 4, 5, 6)

Implicit classes: Enrich types implicit class string. Wrapper(str: String) { def wrap. With. Parens

Implicit classes: Enrich types implicit class string. Wrapper(str: String) { def wrap. With. Parens = s"($str)" } "Text". wrap. With. Parens | (Text)

Powerful features for scalability - Case classes - Traits - Partial functions - Pattern

Powerful features for scalability - Case classes - Traits - Partial functions - Pattern matching - Implicits - Flexible Syntax - Generics - User defined operators - Call-by-name - Macros

Scala & XML

Scala & XML

Values: Inline XML val url = "http: //www. xmllondon. com" val title = "XML

Values: Inline XML val url = "http: //www. xmllondon. com" val title = "XML London 2014" val xml. Tree = <div> <p>Welcome to <a href={url}>{title}</a>!</p> </div> | xml. Tree: scala. xml. Elem = | <div> | <p>Welcome to <a href="http: //www. xmllondon. com/">XML London 2014</a>!</p> | </div>

XML Lookups val list. Of. People = <people> <person>Fred</person> <person>Ron</person> <person>Nigel</person> </people> list. Of.

XML Lookups val list. Of. People = <people> <person>Fred</person> <person>Ron</person> <person>Nigel</person> </people> list. Of. People "person" | Node. Seq(<person>Fred</person>, <person>Ron</person>, <person>Nigel</person>) list. Of. People "_" | Node. Seq(<person>Fred</person>, <person>Ron</person>, <person>Nigel</person>)

XML Lookups val fact = <fact type="universal"> <variable>A</variable> = <variable>A</variable> </fact> fact \ "variable"

XML Lookups val fact = <fact type="universal"> <variable>A</variable> = <variable>A</variable> </fact> fact \ "variable" | Node. Seq(<variable>A</variable>, <variable>A</variable>) fact "@type" | : scala. xml. Node. Seq = universal fact @ "type" | : String = universal

XML Loading val pun = """<pun rating="extreme"> | <question>Why do Comp. Sci students need

XML Loading val pun = """<pun rating="extreme"> | <question>Why do Comp. Sci students need glasses? </question> | <answer>To C#<!-- C# is a Microsoft's programming language -->. </answer> |</pun>""". strip. Margin scala. xml. XML. load. String(pun) | <pun rating="extreme"> | <question>Why do Comp. Sci students need glasses? </question> | <answer>To C#. </answer> | </pun>

Collections: expressive val root = <numbers> {for {i <- 1 to 10} yield <number>{i}</number>}

Collections: expressive val root = <numbers> {for {i <- 1 to 10} yield <number>{i}</number>} </numbers> val numbers = root "number" numbers(0) | <number>1</number> numbers. head | <number>1</number> numbers. last | <number>10</number> numbers take 3 | Node. Seq(<number>1</number>, <number>2</number>, <number>3</number>)

Collections: expressive numbers filter (_. text. to. Int > 6) | Node. Seq(<number>7</number>, <number>8</number>,

Collections: expressive numbers filter (_. text. to. Int > 6) | Node. Seq(<number>7</number>, <number>8</number>, <number>9</number>, <number>10</number>) numbers max. By (_. text) | <number>9</number> numbers max. By (_. text. to. Int) | <number>10</number> numbers. reverse | Node. Seq(<number>10</number>, <number>9</number>, <number>8</number>, <number>7</number>, <number>6</number>, <number>5</number>, <number>4</number>, <number>3</number>, <number>2</number>, <number>1</number>) numbers. group. By(_. text. to. Int % 3) | Map( | 2 -> Node. Seq(<number>2</number>, <number>5</number>, <number>8</number>), | 1 -> Node. Seq(<number>1</number>, <number>4</number>, <number>7</number>, <number>10</number>), | 0 -> Node. Seq(<number>3</number>, <number>6</number>, <number>9</number>))

XML Methods: a rich API % : + aggregate attributes combinations copy. To. Array

XML Methods: a rich API % : + aggregate attributes combinations copy. To. Array diff drop. While flat. Map foreach head init is. Instance. Of last. Index. Of. Slice map mk. String pad. To prefix. Length reduce. Right run. With segment. Length sort. With strict_== take. Right to. Buffer to. Seq transpose with. Filter zip. All ++ : and. Then build. String companion copy. To. Buffer distinct ends. With flatten generic. Builder head. Option inits is. Traversable. Again last. Index. Where max name. To. String par product reduce. Right. Option same. Elements seq sorted string. Prefix take. While to. Indexed. Seq to. Set union xml. Type zip. With. Index ++: apply can. Equal compose corresponds do. Collect. Namespaces exists fold get. Namespace index. Of intersect iterator last. Option max. By namespace partition reduce repr scan size span sum text to. Iterable to. Stream unzip xml_!= +: @ apply. Or. Else child contains count do. Transform filter fold. Left group. By index. Of. Slice is. Atom label length min non. Empty patch reduce. Left reverse scan. Left slice split. At tail the. Seq to. Iterator to. String unzip 3 xml_== /: \ as. Instance. Of collect contains. Slice descendant drop filter. Not fold. Right grouped index. Where is. Defined. At last length. Compare min. By non. Empty. Children permutations reduce. Left. Option reverse. Iterator scan. Right sliding starts. With tails to to. List to. Traversable updated xml_same. Elements /: add. String attribute collect. First copy descendant_or_self drop. Right find forall has. Definite. Size indices is. Empty last. Index. Of lift minimize. Empty or. Else prefix reduce. Option reverse. Map scope sort. By strict_!= take to. Array to. Map to. Vector view zip

For-comprehensions: similar to XQuery <bib>{ for $b in $xml/book b <- xml  "book"

For-comprehensions: similar to XQuery <bib>{ for $b in $xml/book b <- xml "book" let $year : = $b/@year = b @ "year" where $b/publisher = "Addison-Wesley" and if b "publisher" === "Addison-Wesley" && $year > 1991 return <book year="{ $year }"> year > 1991 } yield <book year={ year }> { $b/title } { b "title" } </book> }</bib>

For-comprehensions: similar to XQuery <bib>{ for $b in $xml/book b <- xml  "book"

For-comprehensions: similar to XQuery <bib>{ for $b in $xml/book b <- xml "book" let $year : = $b/@year = b @ "year" where $b/publisher = "Addison-Wesley" and if b "publisher" === "Addison-Wesley" && $year > 1991 return <book year="{ $year }"> year > 1991 } yield <book year={ year }> { $b/title } { b "title" } </book> }</bib>

For-comprehensions: similar to XQuery <bib>{ for $b in $xml/book b <- xml  "book"

For-comprehensions: similar to XQuery <bib>{ for $b in $xml/book b <- xml "book" let $year : = $b/@year = b @ "year" where $b/publisher = "Addison-Wesley" and if b "publisher" === "Addison-Wesley" && $year > 1991 return <book year="{ $year }"> year > 1991 } yield <book year={ year }> { $b/title } { b "title" } </book> }</bib>

For-comprehensions: similar to XQuery <bib>{ for $b in $xml/book b <- xml  "book"

For-comprehensions: similar to XQuery <bib>{ for $b in $xml/book b <- xml "book" let $year : = $b/@year = b @ "year" where $b/publisher = "Addison-Wesley" and if b "publisher" === "Addison-Wesley" && $year > 1991 return <book year="{ $year }"> year > 1991 } yield <book year={ year }> { $b/title } { b "title" } </book> }</bib>

For-comprehensions: similar to XQuery <bib>{ for $b in $xml/book b <- xml  "book"

For-comprehensions: similar to XQuery <bib>{ for $b in $xml/book b <- xml "book" let $year : = $b/@year = b @ "year" where $b/publisher = "Addison-Wesley" and if b "publisher" === "Addison-Wesley" && $year > 1991 return <book year="{ $year }"> year > 1991 } yield <book year={ year }> { $b/title } { b "title" } </book> }</bib>

For-comprehensions: similar to XQuery <bib>{ for $b in $xml/book b <- xml  "book"

For-comprehensions: similar to XQuery <bib>{ for $b in $xml/book b <- xml "book" let $year : = $b/@year = b @ "year" where $b/publisher = "Addison-Wesley" and if b "publisher" === "Addison-Wesley" && $year > 1991 return <book year="{ $year }"> } yield <book year={ year }> { $b/title } { b "title" } </book> }</bib> </book> Nice! }</bib> . . . yet is general purpose

Hybrid XML - XQuery for Scala - java. xml. * for free - Look

Hybrid XML - XQuery for Scala - java. xml. * for free - Look up: XPath - Transform: XSLT - Stream: St. AX

XQuery for Scala (XQS) - Wraps XQuery API for Java (javax. xml. xquery) -

XQuery for Scala (XQS) - Wraps XQuery API for Java (javax. xml. xquery) - Scala access to XQuery in: - Mark. Logic, Base. X, Saxon, Sedna, e. Xist, … - Converts DOM to Scala XML & vice versa - http: //github. com/fancellu/xqs

XQuery via XQS val widgets = <widgets> <widget>Menu</widget> <widget>Status bar</widget> <widget id="panel-1">Panel</widget> <widget id="panel-2">Panel</widget>

XQuery via XQS val widgets = <widgets> <widget>Menu</widget> <widget>Status bar</widget> <widget id="panel-1">Panel</widget> <widget id="panel-2">Panel</widget> </widgets> import com. felstar. xqs. XQS. _ val conn = new net. xqj. basex. local. Base. XXQData. Source(). get. Connection val nodes: Node. Seq = conn("for $w in /widgets/widget order by $w return $w", widgets) | Node. Seq(<widget>Menu</widget>, <widget id="panel-1">Panel</widget>, | <widget id="panel-2">Panel</widget>, <widget>Status bar</widget>)

XPath import com. felstar. xqs. XQS. _ val widgets = <widgets> <widget>Menu</widget> <widget>Status bar</widget>

XPath import com. felstar. xqs. XQS. _ val widgets = <widgets> <widget>Menu</widget> <widget>Status bar</widget> <widget id="panel-1">Panel</widget> <widget id="panel-2">Panel</widget> </widgets> val xpath = XPath. Factory. new. Instance(). new. XPath() val nodes = xpath. evaluate("/widgets/widget[not(@id)]", to. Dom(widgets), XPath. Constants. NODESET). as. Instance. Of[Node. List] (nodes: Node. Seq) | Node. Seq(<widget>Menu</widget>, <widget>Status bar</widget>) Natively in Scala: (widgets "widget")(widget => (widget "@id"). is. Empty) | Node. Seq(<widget>Menu</widget>, <widget>Status bar</widget>)

XSLT val stylesheet = <xsl: stylesheet xmlns: xsl="http: //www. w 3. org/1999/XSL/Transform" version="2. 0">

XSLT val stylesheet = <xsl: stylesheet xmlns: xsl="http: //www. w 3. org/1999/XSL/Transform" version="2. 0"> <xsl: template match="john"> <xsl: copy>Hello, John. </xsl: copy> </xsl: template> val people. Xml = <people> <xsl: template match="node()|@*"> <john>Hello, John. </john> <xsl: copy> <smith>Smith is here. </smith> <xsl: apply-templates select="node()|@*"/> <another>Hello. </another> </xsl: copy> </xsl: template> </people> </xsl: stylesheet> import com. felstar. xqs. XQS. _ val xml. Result. Resource = new java. io. String. Writer() val xml. Transformer = Transformer. Factory. new. Instance(). new. Transformer(stylesheet) xml. Transformer. transform(people. Xml, new Stream. Result(xml. Result. Resource)) xml. Result. Resource. get. Buffer | <? xml version="1. 0" encoding="UTF-8"? ><people> | <john>Hello, John. </john> | <smith>Smith is here. </smith> | <another>Hello. </another> | </people>

XML Stream Processing // 4 GB file, comes back in a second val src

XML Stream Processing // 4 GB file, comes back in a second val src = Source. from. URL("http: //dumps. wikimedia. org/enwiki/20140402/enwiki-20140402 -abstract. xml") val er = XMLInput. Factory. new. Instance(). create. XMLEvent. Reader(src. reader) implicit class XMLEvent. Iterator(ev: XMLEvent. Reader) extends scala. collection. Iterator[XMLEvent]{ def has. Next = ev. has. Next def next = ev. next. Event() } er. drop. While(!_. is. Start. Element). take(10). zip. With. Index. foreach { case (ev, idx) => println(s"${idx+1}: t$ev") } src. close() | | | | 1: 2: <feed> 3: 4: <doc> 5: 6: 7: 8: <title> Wikipedia: Anarchism </title> 9: 10: <url> http: //en. wikipedia. org/wiki/An

Use Cases - Data extraction - Serving XML via REST - Dynamically generated XSLT

Use Cases - Data extraction - Serving XML via REST - Dynamically generated XSLT - Interfacing with XML databases - Flexibility to choose the best tool for the job

Excellent Ecosystem SBT Akka Spark scalaz Spray Specs shapeless scala-xml Scaladin Scala. Test scala-maven-plugin

Excellent Ecosystem SBT Akka Spark scalaz Spray Specs shapeless scala-xml Scaladin Scala. Test scala-maven-plugin JVM macro-paradise

Conclusion - Practical for XML processing

Conclusion - Practical for XML processing

Where do I start? - atomicscala. com - typesafe. com/activator - scala-lang. org -

Where do I start? - atomicscala. com - typesafe. com/activator - scala-lang. org - scala-ide. org - Intelli. J

Matt Stephens Charles Foster

Matt Stephens Charles Foster

Open to consulting www. scala. contractors Follow us on Twitter: @Dino. Fancellu @Scala. William

Open to consulting www. scala. contractors Follow us on Twitter: @Dino. Fancellu @Scala. William @Maff. Stephens