What can ONE instruction do Enabling your XML

  • Slides: 20
Download presentation
What can ONE instruction do? Enabling your XML applications efficiently processing Giga. Byte-level XML

What can ONE instruction do? Enabling your XML applications efficiently processing Giga. Byte-level XML documents!! 1

XML Evolution: Two-phase XML Processing Model Using XML Prefiltering Techniques VLDB 2006 September 12

XML Evolution: Two-phase XML Processing Model Using XML Prefiltering Techniques VLDB 2006 September 12 -15, 2006 Chia-Hsin Huang, Tyng-Ruey Chuang, James J. Lu, and Hahn-Ming Lee 2

How much do you know about DOM and SAX in XML processing? They can

How much do you know about DOM and SAX in XML processing? They can not process Large XML documents efficiently!! 3

DOM Processing Model XPath expression: /html/body/ul/li/text the Source: http: //www. cee. hw. ac. uk/~alison/

DOM Processing Model XPath expression: /html/body/ul/li/text the Source: http: //www. cee. hw. ac. uk/~alison/ netapp/dom/sld 006. htm VLDB 2006 (9/12~9/15) 4

SAX Processing Model XPath expression: //entry[@id=“a 2”] Source: http: //www. informatik. hu-berlin. de/~obecker/Lehre/SS 2002/XML/images/sax.

SAX Processing Model XPath expression: //entry[@id=“a 2”] Source: http: //www. informatik. hu-berlin. de/~obecker/Lehre/SS 2002/XML/images/sax. t. gif VLDB 2006 (9/12~9/15) 5

Problems in Standard DOM and SAX Processing Models • Both DOM and SAX processing

Problems in Standard DOM and SAX Processing Models • Both DOM and SAX processing models waste a large amount of computational resources by processing uninteresting fragments. • They may not be able to query Large XML documents efficiently. – (Size of a DOM tree) : (Size of the XML doc. ) = 5 : 1 – SAX cannot parse a document in a random access manner • No backtrack mechanisms (look forward parsing) • Lack of interactive mechanisms VLDB 2006 (9/12~9/15) 6

XML Processing Enhancements XML Applications • Unchangeable? • or a few modifications! Requirements? XML

XML Processing Enhancements XML Applications • Unchangeable? • or a few modifications! Requirements? XML Standards • Unchangeable!? VLDB 2006 (9/12~9/15) 7

Issues in Existing XML Processing Enhancements • Consume large amount of disk/memory space and

Issues in Existing XML Processing Enhancements • Consume large amount of disk/memory space and CPU time (Cost: $) • Large-scale (Cost: $) • Integrate with relational database (Cost: $$$) • Complicated index/query algorithms (Cost: $$$$) • Intrusive (considerable modifications) (Cost: $$$$$) • Non-transparent (apps. need to be aware of the mechanics) (Cost: $$$$$) VLDB 2006 (9/12~9/15) 8

The Simplest Solution: Two-phase XML Processing Models using XML Prefiltering Techniques 9

The Simplest Solution: Two-phase XML Processing Models using XML Prefiltering Techniques 9

XML Prefiltering Technique The Solution XPath Expression (Issued by users’ apps. ) Prefiltering Candidate-set

XML Prefiltering Technique The Solution XPath Expression (Issued by users’ apps. ) Prefiltering Candidate-set Techniques XML document (A tiny search engine) XML Parsers (DOM/SAX) XML document VLDB 2006 (9/12~9/15) 10

Two-phase XML Processing Model – Enhanced DOM-based Applications VLDB 2006 (9/12~9/15) 11

Two-phase XML Processing Model – Enhanced DOM-based Applications VLDB 2006 (9/12~9/15) 11

Two-phase XML Processing Model – Prefiltering XPath Processor VLDB 2006 (9/12~9/15) 12

Two-phase XML Processing Model – Prefiltering XPath Processor VLDB 2006 (9/12~9/15) 12

Two-phased XML Processing Model – Enhanced SAX-based Applications VLDB 2006 (9/12~9/15) 13

Two-phased XML Processing Model – Enhanced SAX-based Applications VLDB 2006 (9/12~9/15) 13

Two-phase XML Processing Model – Stream-based XPath Processor VLDB 2006 (9/12~9/15) 14

Two-phase XML Processing Model – Stream-based XPath Processor VLDB 2006 (9/12~9/15) 14

Characteristics of the XML Prefiltering Technique • • • Correct Small-scale Lightweight Efficient Transparent

Characteristics of the XML Prefiltering Technique • • • Correct Small-scale Lightweight Efficient Transparent Non-intrusive? – User applications or XML processors require adding Few (one or two) instructions VLDB 2006 (9/12~9/15) 15

Demonstrations http: //www. iis. sinica. edu. tw/~jashing/prefiltering 16

Demonstrations http: //www. iis. sinica. edu. tw/~jashing/prefiltering 16

System Architecture VLDB 2006 (9/12~9/15) 17

System Architecture VLDB 2006 (9/12~9/15) 17

Demo Items • System modules (http: //www. iis. sinica. edu. tw/~jashing/prefiltering/Download. htm) – –

Demo Items • System modules (http: //www. iis. sinica. edu. tw/~jashing/prefiltering/Download. htm) – – – Indexer Query Simplifier Fast Lightweight Steps-Axes Analyzer Fragment Gatherer Micro XML Streaming Parser (an interactive streaming parser) • An Application (http: //www. iis. sinica. edu. tw/~jashing/prefiltering/Applications. htm) – GML-based Web GIS (Chia-Hsin Huang, Tyng-Ruey Chuang, Dong-Po Deng, and Hahn-Ming Lee, "Efficient GMLnative Processors for Web-based GIS: Techniques and Tools, " to appear in the proc. of ACM-GIS'06) VLDB 2006 (9/12~9/15) 18

VLDB 2006 (9/12~9/15) 19

VLDB 2006 (9/12~9/15) 19

Thank you very much. Q&A 20

Thank you very much. Q&A 20