Importing XML Importing and Splitting XML Data Date

























- Slides: 25

Importing XML Importing and Splitting XML Data Date: 9/30/2008 M D Metadata Solutions Dan Mc. Creary President Dan Mc. Creary & Associates dan@danmccreary. co Copyright 2008 Dan Mc. Crearym & Associates 1

Outline • Take a spreadsheet of data in Microsoft Excel • Use o. Xygen’s import Excel and import the data into and XML file • Save the XML file to a new collection and create a test XQuery using doc() • Create an XQuery that will split the single XML file into multiple XML files, one file per row • Create a new XQuery using the collection() function M D Copyright 2008 Dan Mc. Creary & Associates 2

Original Spreadsheet Headers M D Copyright 2008 Dan Mc. Creary & Associates 3

Sample ISO Data Element Terms Sample of ISO 11179 Representation Terms • • • M D Date Code ID Indicator Name Text Document. Approval. Date Object Class Property Term See http: //en. wikipedia. org/wiki/Representation_term for a complete list Copyright 2008 Dan Mc. Creary & Associates 4

Column Headers with Representation Term Suffix M D Copyright 2008 Dan Mc. Creary & Associates 5

Select the Sheet M D Copyright 2008 Dan Mc. Creary & Associates 6

M D Copyright 2008 Dan Mc. Creary & Associates 7

M D Copyright 2008 Dan Mc. Creary & Associates 8

M D Copyright 2008 Dan Mc. Creary & Associates 9

M D Copyright 2008 Dan Mc. Creary & Associates 10

Remove All Underscores M D Copyright 2008 Dan Mc. Creary & Associates 11

Hints on Using Replace • Remember to include the “>” at the end to replace all begin and end tags • Replace “Old. Name>” with “New. Name>” • You can selectively use case • You can use regular expressions • You can use XPath expressions M D Copyright 2008 Dan Mc. Creary & Associates 12

Final Result <countries> <country> <Country. Name>AFGHANISTAN</Country. Name> <Primary. Author>Chris</Primary. Author> <Start. Date>38795</Start. Date> <First. Draft. Date>38803</First. Draft. Date> <First. Review. Date>38815</First. Review. Date> <Author. Redraft. Date>38829</Author. Redraft. Date> <Second. Reviewer. Name>Amy</Second. Reviewer. Name> <Second. Review. Date>38858</Second. Review. Date> <Second. Review. Final. Date>38872</Second. Review. Final. Date> <Compleation. Date>38885</Compleation. Date> </country> … M D Copyright 2008 Dan Mc. Creary & Associates 13

Splitting Large Files • Many large XML files can be considered collections of uniform data structures – A spreadsheet is a collection of “row” objects • It is sometimes advantageous to put each object in its own XML file – Record Locking: Allows a file to be locked by a single user for editing – Metadata: The database automatically can keep track of when these records were created and when they were modified (document-metadata) – Versioning: Each object can have its own version history M D Copyright 2008 Dan Mc. Creary & Associates 14

Missing Updates Problem Editing Item 47 Bob Alice Bob Opens Edit Form Editing Item 47 Alice Opens Edit Form Bob Saves Changes Alice Overwrites Bob’s Changes • What happens when two people load a record into the web client at the same time? • If Alice reads the record before the Bob does his save, Bobs updates will be lost when Alice does her update time M D Copyright 2008 Dan Mc. Creary & Associates 15

Solution: Run File Through a “Splitter” • Use a small XQuery called a “splitter” to break a single file into many small files • Visit each “row” and create a new file with the row’s content • Use a counter to create the file name: – 1. xml, 2. xml, 3. xml – This is known as an “artificial key” M D Copyright 2008 Dan Mc. Creary & Associates 16

Using the store function • All XML databases provide a function that allows you to store a new file into a collection • In e. Xist this is called xmldb: store() – xmldb: store($collection, $filename, $data) • Where: – $collection is the place you want to store the new file – $filename is the name of the file – $data is the data you want to put into the file M D Copyright 2008 Dan Mc. Creary & Associates 17

Basic splitter. xq Structure for $item at $count in doc($input-file)/root/row let $filename : = concat($count, '. xml') let $result : = store($collection, $filename, $item) return $result te that file names are a concatenation of the count and the suffix “. xm M D Copyright 2008 Dan Mc. Creary & Associates 18

Testing Steps • Built a test splitter (splitter-test. xq) with the store line commented out • Run the test to make sure the data is exactly the way you want it • Uncomment the line and save to splitter. xq • Run the script (only once) • View Results M D Copyright 2008 Dan Mc. Creary & Associates 19

xquery version "1. 0"; Doc Workflow Example let $input-file : = '/db/apps/doc-reviewer/import. xml' let $data-collection : = '/db/apps/doc-reviewer/data' splitter-test. xq (: the login used must have write access to the collection : ) let $output-collection : = xmldb: login($data-collection, ‘login', ‘password') return <Split. Results>{ for $country at $count in doc($input-file)/countries/country let $filename : = concat($count, '. xml') (: let $store-return : = xmldb: store($data-collection, $filename, $country) : ) return <store> <id>{$count}</id> <command>store({$data-collection}, {$filename}, {$country})</command> <results></results> </store> }</Split. Results> Copyright 2008 Dan Mc. Creary & Associates 20

Test Run Results M D Copyright 2008 Dan Mc. Creary & Associates 21

xquery version "1. 0"; Doc Workflow Example let $input-file : = '/db/apps/doc-reviewer/import. xml' let $data-collection : = '/db/apps/doc-reviewer/data' splitter-test. xq (: the login used must have write access to the collection : ) let $output-collection : = xmldb: login($data-collection, ‘login', ‘password') return <Split. Results>{ for $country at $count in doc($input-file)/countries/country let $filename : = concat($count, '. xml') let $store-return : = xmldb: store($data-collection, $filename, $country) return <store> <id>{$count}</id> <command>store({$data-collection}, {$filename}, {$country})</command> <results>{$store-return}</results> </store> }</Split. Results> M D Copyright 2008 Dan Mc. Creary & Associates 22

Actual Run M D Copyright 2008 Dan Mc. Creary & Associates 23

Inspecting Results M D Copyright 2008 Dan Mc. Creary & Associates 24

Thank You! Please contact me for more information: • • • Native XML Databases Metadata Management Metadata Registries Service Oriented Architectures Business Intelligence and Data Warehouse Semantic Web Dan Mc. Creary, President Dan Mc. Creary & Associates Metadata Strategy Development dan@danmccreary. com (952) 931 -9198 M D Copyright 2008 Dan Mc. Creary & Associates 25