CSCI 5333 DBMS Chapter 26 XML and Internet

CSCI 5333 DBMS

Chapter 26 XML and Internet Databases CSCI 5333 DBMS

Outline § Structured, Semistructured, & Unstructured Data § XML Hierarchical Data Model § XML Document, DTD, & XML Schema § XML Documents & Databases § XML Querying CSCI 5333 DBMS

Structured vs Semistructured Data l Structured Data: e. g. , information stored in databases; all records have the same format as defined in the relational schema l Semistructured data may have a certain structure but no all the information collected will have identical structure. CSCI 5333 DBMS 4

FIGURE 26. 1 Representing semistructured data as a graph. CSCI 5333 DBMS 5

FIGURE 26. 2 Part of an HTML document representing unstructured data (c. f. , the company database schema) CSCI 5333 DBMS 6

XML Hierarchical (Tree) Data Model l Problem with HTML document: Difficult to interpret automatically by programs because they do not include schema information about the type of data in the documents Inappropriate as intermediate Web documents to be exchanged among various computer sites l Solution XML documents Two main structuring concepts: elements, attributes l c. f. , In XML, tag names are defined to describe the meaning of the data elements, rather than to describe how the text is to be displayed (as in HTML). CSCI 5333 DBMS 7

Standalone=“yes” - schemaless FIGURE 26. 3 A complex XML element called Correction: <project> Complex elements: <projects>, <project>, <Worker> <projects>. Simple elements: <Name>, <Number>, <SSN>, … CSCI 5333 DBMS 8

XML Documents, DTD, and XML Schema l A well-formed XML document is one that follows a few conditions. – – Start with an XML declaration (version, …) Tree model A single root element Matching start and end tags for an element must be within the tags of the parent element – Syntactically correct CSCI 5333 DBMS 9

XML Documents, DTD, and XML Schema l A valid XML document is well formed, and in addition the element names used in the start and end tag pairs must follow the structure specified in a separate XML DTD (Document Type Definition) file or XML schema file. l Figure 26. 4: a sample XML DTD called projects * Zero or more, + one or more, ? Zero or one Otherwise: exactly once (data type) (#PCDATA) parsed character data CSCI 5333 DBMS 10

FIGURE 26. 4 An XML DTD file called projects To use the DTD file: (1) Store the DTD file in the same file system as the XML document (2) <? xml version=“ 1. 0” standalone=“no”? > (3) <!DOCTYPE projects SYSTEM “proj. dtd”> CSCI 5333 DBMS 11

DTD Limitations 1) 2) 3) l Data types in DTD are not very general Has its own special syntax and thus requires specialized processors All DTD elements are always forced to follow the specified ordering of the documents, so unordered elements are not permitted. Solution XML Schema CSCI 5333 DBMS 12

FIGURE 26. 5 An XML schema file called company Schema namespace the root element company; also an unnamed complex element • “Department”, “Employee”, etc. must be named types. • The selector “employee. Dependent” is an attribute of “Employee”, of type “Dependent”. • The field “dependent. Name” in “Dependent” must be unique. CSCI 5333 DBMS 13

FIGURE 26. 5 (continued) An XML schema file called company. <xsd: uniqu …> specifies a key constraint for non-primary key element. <xsd: key> specifies a primary key. <xsd: keyref> specifies a foreign key; <xsd: selector> refers to the referencing element type; <xsd: field> refers to the referencing attribute. CSCI 5333 DBMS 14

FIGURE 26. 5 (continued) An XML schema file called company Exercise: Define the element “project. Worker” in the type “Project” as an embedded sub-element. Answer: <xsd: element name=“project. Worker” min. Occurs=“ 1” max. Occurs=“unbound”> <xsd: sequence> <xsd: element name=“SSN” type=“xsd: string” /> <xsd: element name=“hours” type=“xsd: float” /> </xsd: sequence> </xsd: element> CSCI 5333 DBMS 15

FIGURE 26. 5 (continued) An XML schema file called company CSCI 5333 DBMS 16

XML Documents and Databases l Approaches to Storing XML Documents l Extracting XML Documents from Relational Databases l Breaking Cycles to Convert Graphs into Trees l Other Steps for Extracting XML Documents from Databases CSCI 5333 DBMS 17

FIGURE 26. 6 An ER schema diagram for a simplified UNIVERSITY database. CSCI 5333 DBMS 18

FIGURE 26. 7 Subset of the UNIVERSITY database schema needed for XML document extraction. CSCI 5333 DBMS 19

FIGURE 26. 8 Hierarchical (tree) view with COURSE as the root. CSCI 5333 DBMS 20

FIGURE 26. 9 XML schema document with COURSE as the root. CSCI 5333 DBMS 21

FIGURE 26. 10 Hierarchical (tree) view with STUDENT as the root. CSCI 5333 DBMS 22

FIGURE 26. 11 XML schema document with STUDENT as the root. CSCI 5333 DBMS 23

FIGURE 26. 12 Hierarchical (tree) view with SECTION as the root. CSCI 5333 DBMS 24

FIGURE 26. 13 Converting a graph with cycles into a hierarchical (tree) structure. CSCI 5333 DBMS 25

XML Query l XPath: Specifying Path Expressions in XML l XQuery: Specifying Queries in XML CSCI 5333 DBMS 26

FIGURE 26. 14 Some examples of XPath expressions on XML documents that follow the XML schema file COMPANY in Figure 26. 5 CSCI 5333 DBMS 27

FIGURE 26. 15 Some examples of XQuery queries on XML documents that follow the XML schema file COMPANY in Figure 26. 5. CSCI 5333 DBMS 28

Summary l XML documents l XML & databases CSCI 5333 DBMS 29
- Slides: 29