Foundational Data Modeling and Schema Transformations for XML










































- Slides: 42
Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David W. Embley Computer Science Department Brigham Young University, Provo, Utah
XML Data Engineering n Model XML conceptually n Map conceptual models to XML n Reverse-engineer XML to conceptual models n Ensure properties n Information preserving transformations n Constraint preserving transformations n Redundancy-free guarantees 24 April 2008 UNISCON 2008, Klagenfurt, Austria 2
C-XML 24 April 2008 UNISCON 2008, Klagenfurt, Austria 3
Modeling XML Conceptually n Scaling the mountain of abstraction n Delicate balance n n Enough modeling constructs But not to many § High-level capture of essentials § Avoidance of low-level implementation details n Formal but easily understood n XML needs better abstractions 24 April 2008 UNISCON 2008, Klagenfurt, Austria 4
XML Schema/Model Mismatch n XML features not explicitly supported in traditional conceptual models: Ordered lists of concepts n Choice of concept from among several n Mixed content n Use of content from another model n Nested information hierarchies n n C-XML 24 April 2008 UNISCON 2008, Klagenfurt, Austria 5
Missing Modeling Constructs (1) n Sequence structure n Parent concept n Ordered child concepts n Constrained recurrence of children n Constrained recurrence of sequence itself <xs: sequence min. Occurs="1" max. Occurs="2"> <xs: element name="First. Name" type="xs: string"/> <xs: element name="Middle. Name" type="xs: string“ min. Occurs="0" max. Occurs="2"/> <xs: element name="Last. Name" type="xs: string"/> </xs: sequence> 24 April 2008 UNISCON 2008, Klagenfurt, Austria 6
Missing Modeling Constructs (1) 24 April 2008 UNISCON 2008, Klagenfurt, Austria 7
Missing Modeling Constructs (2) n Choice structure n Parent concept n Choose one child concept from several alternatives n Constrained recurrence of chosen child n Constrained recurrence of choice itself <xs: choice max. Occurs="2"> <xs: element name="Phone. Number" type="xs: string" min. Occurs="1" max. Occurs="2" /> <xs: element name="Email" type="xs: string"/> <xs: element name="Fax" type="xs: string"/> </xs: choice> 24 April 2008 UNISCON 2008, Klagenfurt, Austria 8
Missing Modeling Constructs (3) n Mixed attribute n Allows character and element data to be intertwined <xs: complex. Type mixed="true"> n Any and any. Attribute structures n Insert structures from other namespaces n Constrained recurrence <xs: any namespace="##other" min. Occurs="0"/> <xs: any. Attribute namespace="##any"/> 24 April 2008 UNISCON 2008, Klagenfurt, Austria 9
Missing Modeling Constructs (4) n Nesting of hierarchical structures n Key organizational characteristic of XML n Arbitrarily complex nesting possible 24 April 2008 UNISCON 2008, Klagenfurt, Austria 10
C-XML Example 24 April 2008 UNISCON 2008, Klagenfurt, Austria 11
C-XML TO XML SCHEMA 24 April 2008 UNISCON 2008, Klagenfurt, Austria 12
C-XML <? xml version="1. 0" encoding="UTF-8"? > <xs: schema xmlns: xs="http: //www. w 3. org/2001/XMLSchema" element. Form. Default="qualified"> <xs: element name="Root"> <xs: complex. Type> <xs: all> <xs: element ref="Students"/> <xs: element ref="Courses"/> <xs: element ref="Grad. Students"/> <xs: element ref="Undergrad. Students"/> </xs: all> </xs: complex. Type> <xs: keyref name="Undergrad. Student. OID-Keyref" refer="Student. OID-Key"> <xs: selector xpath=". /Undergrad. Students/Undergrad. Student"/> <xs: field xpath="@Undergrad. Student. OID"/> </xs: keyref> <xs: keyref name="Grad. Student. OID-Keyref" refer="Student. OID-Key"> <xs: selector xpath=". /Grad. Students/Grad. Student"/> <xs: field xpath="@Grad. Student. OID"/> </xs: keyref> </xs: element> <xs: element name="Students"> <xs: complex. Type> <xs: sequence> <xs: element name="Student" max. Occurs="unbounded"> <xs: complex. Type> <xs: sequence> <xs: choice min. Occurs="1" max. Occurs="1"> <xs: element name="Student. Name" type="xs: string"/> <xs: sequence> <xs: element name="First. Name" type="xs: string"/> <xs: element name="Middle. Names"> <xs: complex. Type> <xs: sequence> <xs: element name="Middle. Name" min. Occurs="0" max. Occurs="2"> <xs: complex. Type> <xs: attribute name="Middle. Name" type="xs: string" use="required"/> </xs: complex. Type> </xs: element> </xs: sequence> </xs: complex. Type> <xs: key name="Middle. Name-Key"> <xs: selector xpath=". /Middle. Name"/> <xs: field xpath="@Middle. Name"/> </xs: key> </xs: element> <xs: element name="Last. Name" type="xs: string"/> </xs: sequence> </xs: choice> <xs: element name="Semester-Course-Grades"> 13 <xs: complex. Type> <xs: sequence> XML Schema
Algorithm Overview Generate a forest of scheme trees n Translate an individual object set n Translate scheme-tree collections of object sets n Create a root node n Add uniqueness constraints n Translate generalization/specialization hierarchies n 14
Generate Scheme Trees (Student, Student. ID, Student. Name, First. Name, Last. Name, (Middle. Name)*, (Course, Semester, Grade)*)* 15
Generate Scheme Trees (Course, Department)* 16
Generate Scheme Trees (Undergrad. Student)* (Grad. Student, Advisor)* 17
Generate Scheme Trees (Student, Student. ID, Student. Name, First. Name, Last. Name, (Middle. Name)*, (Course, Semester, Grade)*)* (Course, Department)* (Grad. Student, Advisor)* (Undergrad. Student)* 18
Generate Scheme Trees Student, Student. ID, Student. Name, First. Name, Last. Name Middle. Name Course, Department Course, Semester, Grade Grad. Student, Advisor Undergrad. Student (Student, Student. ID, Student. Name, First. Name, Last. Name, (Middle. Name)*, (Course, Semester, Grade)*)* (Course, Department)* (Grad. Student, Advisor)* (Undergrad. Student)* 19
Individual Object Sets <xs: attribute name="Department" type="xs: string"/> <xs: attribute name="Course" type="xs: string"/> <xs: attribute ref="Course"/> <xs: element name="First. Name" type="xs: string"/> <xs: element name="Student"> <xs: complex. Type>. . . <xs: attribute name="Student. OID" type="xs: string" use="required"/> </xs: complex. Type> </xs: element> 20
Scheme-Tree Translation Middle. Names Students Middle. Name Course-Semester-Grades Course-Semester-Grade Courses Grad. Students Undergrad. Students Course Grad. Student Undergrad. Student 21
Scheme-Tree Translation <xs: element name="Students"> <xs: complex. Type> <xs: sequence> <xs: element name="Student" max. Occurs="unbounded"> <xs: complex. Type>. . . </complex. Type> </xs: element> </xs: sequence> </xs: complex. Type> </xs: element> <xs: element name="Semester-Course-Grades"> <xs: complex. Type> <xs: sequence> <xs: element name="Semester-Course-Grade" min. Occurs="0" max. Occurs="unbounded"> <xs: complex. Type>. . . </xs: complex. Type> </xs: element> </xs: sequence> </xs: complex. Type>. . . </xs: element> 22
Scheme-Tree Translation <xs: element name="Semester-Course-Grade" min. Occurs="0" max. Occurs="unbounded"> <xs: complex. Type> <xs: attribute name="Semester" use="required"/> <xs: attribute ref="Course" use="required"/> <!-- C-XML: forall x (Course(x)=> exists [0: *] <x 1, x 2, x 3> (Course(x) Student(x 1) Semester(x 2) Grade(x 3) )) --> <xs: attribute name="Grade" type="xs: string" use="required"/> </xs: complex. Type> </xs: element> 23
24 24
Root Element Students Courses Grad. Students <xs: schema > <xs: element name="Root"> <xs: complex. Type> <xs: all> <xs: element ref="Students"/> <xs: element ref="Courses"/> <xs: element ref="Grad. Students"/> <xs: element ref="Undergrad. Students"/> </xs: all> </xs: complex. Type>. . . </xs: element>. . . </xs: schema> Undergrad. Students 25
Uniqueness Constraints <xs: element name="Students"> <xs: complex. Type> <xs: sequence> <xs: element name="Student" max. Occurs="unbounded"> <xs: complex. Type>. . . </xs: complex. Type> </xs: element> </xs: sequence> </xs: complex. Type> <xs: key name="Student. OID-Key"> <xs: selector xpath=". /Student"/> <xs: field xpath="@Student. OID"/> </xs: key> <xs: key name="Student. ID-Key"> <xs: selector xpath=". /Student"/> <xs: field xpath="@Student. ID"/> </xs: key> </xs: element> 26
Generalization/Specialization <xs: keyref name="Undergrad. Student. OID-Keyref" refer="Student. OID-Key"> <xs: selector xpath=". /Undergrad. Students/Undergrad. Student"/> <xs: field xpath="@Undergrad. Student. OID"/> </xs: keyref> <xs: keyref name="Grad. Student. OID-Keyref" refer="Student. OID-Key"> <xs: selector xpath=". /Grad. Students/Grad. Student"/> <xs: field xpath="@Grad. Student. OID"/> </xs: keyref> 27
XML SCHEMA TO C-XML 24 April 2008 UNISCON 2008, Klagenfurt, Austria 28
XML Schema C- XML 29
Algorithm Overview Generate object sets for each element & attribute n Specify built-in and simple types in data frames n Obtain relationship sets from parent-child connections n Obtain participation constraints from min. Occurs, max. Occurs, and use constraints n 30
Attribute Transformation 31
Element Transformation 32
Choice Transformation 33
Sequence Transformation 34
Key Constraints Transformation 35
Substitution Group & Extension Transformation 36
Observation on Transformations n These transformations to and from C-XML are not inverses of one another n However, C-XML Schema C-XML 37
Demo 24 April 2008 UNISCON 2008, Klagenfurt, Austria 38
PROPERTY GUARANTEES 24 April 2008 UNISCON 2008, Klagenfurt, Austria 39
Transformation Properties: -XML to XML Schema C n Theorem 1: … preserves information. Proof: injective n Theorem 2: Allowing for pragma constraints, … preserves constraints. Proof: by construction n Theorem 3: … yields an XML-Schema instance whose complying XML documents are redundancy free. Proof: [TKDE, Aug 06] 24 April 2008 UNISCON 2008, Klagenfurt, Austria 40
Transformation Properties: Schema to C-XML n Theorem 4: … preserves information. Proof: injective n Theorem 5: … preserves constraints. Proof: by construction 24 April 2008 UNISCON 2008, Klagenfurt, Austria 41
Conclusions n C-XML models XML conceptually n Transformations n C-XML to XML n Reverse-engineer XML to C-XML n Properties n Information preserving n Constraint preserving n Redundancy-free guarantee www. deg. byu. edu 24 April 2008 UNISCON 2008, Klagenfurt, Austria 42