Challenges in XML Its good but is it
Challenges in XML It’s good… but is it good enough? Siddhesh Bhobe Persistent e. Business Solutions
XML goals 1. XML shall be straightforwardly usable over the Internet. 2. XML shall support a wide variety of applications. 3. XML shall be compatible with SGML. 4. It shall be easy to write programs which process XML documents. 5. The number of optional features in XML is to be kept to the absolute minimum, ideally zero.
XML goals (Cont. ) 6. XML documents should be human-legible and reasonably clear. 7. The XML design should be prepared quickly. 8. The design of XML shall be formal and concise. 9. XML documents shall be easy to create. 10. Terseness in XML markup is of minimal importance.
XML has been successful! l XML is emerging as a standard for Document Exchange l Significant momentum in the market place and industry consortia – – Commerce One Rosetta Net Biztalk Pe. BS
But… is it good enough?
XML is verbose Column 1|Column 2|………. Column. N| becomes <Row> <Column>value 1</Column> <Column>value 2</Column> : : <Column>value. N</Column> </Row>
So? l More storage space l More network transmission time l Data exchange on the net will be very expensive!
Anything positive? l Lends itself very well to compression in case of structured data (like web logs) l Migrated non-XML data gives better compression than the original data. (Xmill, paper at SIGMOD 2000)
Storing XML is difficult! l Store as text, but… l Impossible to query… no indexing possible! l Additional cost of creating blocks… limit on size of text that can be stored in databases l Can be updated only by replacing the entire XML document!
Storing XML (Cont. ) l Store l XML in database tables, but… to relational data conversion is very expensive! l Current set of tools handle only regular XML document structures (Ex. XML-DBMS)
XML in Oracle 8 i l Store XML document as a single, intact object with its tags in a CLOB or BLOB l Store the XML document as data and distribute it untagged across objectrelational tables l Combine XML documents and data using views
Processing XML is costly! l XML needs to be parsed… and that is not efficient! l Tools available today are not easy to use. Need better ones. l Text processing is always a performance hit l Do NOT use XML for passing parameters!
No data types in XML l No data type support… all XML data is text l Limited options for binary data l XML Schema Part 2: Datatypes ( W 3 C Working Draft, 22 September 2000) proposes facilities for defining datatypes in XML
Encoding Binary Data <image > <width>2048</> <height>2500</> <pixels>. . . </> </image> l The PIXELS element would contain the binary data encoded in some notation like Base 64.
Multipart/related MIME type (RFC 2112) l MIME is used for e-mail messages that are not just ASCII text, but include different "types" of information l The multipart/related MIME type was developed to represent compound documents. Individual parts represent individual streams in the compound document.
Example… Content-Type: multipart/related --xxxxx Content-Type: application/binary Content-Transfer-Encoding: Little-Endian Content-ID: Pixels Content-Length: 524288. …encoded binary data here. . . --xxxxx
Conclusion l XML is great as a data exchange format, but…. l Need compression l Need better storage techniques l Need fast and easy to use parsers l Need data type support
- Slides: 17