Foundational Data Modeling and Schema Transformations for XML

  • Slides: 42
Download presentation
Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information

Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David W. Embley Computer Science Department Brigham Young University, Provo, Utah

XML Data Engineering n Model XML conceptually n Map conceptual models to XML n

XML Data Engineering n Model XML conceptually n Map conceptual models to XML n Reverse-engineer XML to conceptual models n Ensure properties n Information preserving transformations n Constraint preserving transformations n Redundancy-free guarantees 24 April 2008 UNISCON 2008, Klagenfurt, Austria 2

C-XML 24 April 2008 UNISCON 2008, Klagenfurt, Austria 3

C-XML 24 April 2008 UNISCON 2008, Klagenfurt, Austria 3

Modeling XML Conceptually n Scaling the mountain of abstraction n Delicate balance n n

Modeling XML Conceptually n Scaling the mountain of abstraction n Delicate balance n n Enough modeling constructs But not to many § High-level capture of essentials § Avoidance of low-level implementation details n Formal but easily understood n XML needs better abstractions 24 April 2008 UNISCON 2008, Klagenfurt, Austria 4

XML Schema/Model Mismatch n XML features not explicitly supported in traditional conceptual models: Ordered

XML Schema/Model Mismatch n XML features not explicitly supported in traditional conceptual models: Ordered lists of concepts n Choice of concept from among several n Mixed content n Use of content from another model n Nested information hierarchies n n C-XML 24 April 2008 UNISCON 2008, Klagenfurt, Austria 5

Missing Modeling Constructs (1) n Sequence structure n Parent concept n Ordered child concepts

Missing Modeling Constructs (1) n Sequence structure n Parent concept n Ordered child concepts n Constrained recurrence of children n Constrained recurrence of sequence itself <xs: sequence min. Occurs="1" max. Occurs="2"> <xs: element name="First. Name" type="xs: string"/> <xs: element name="Middle. Name" type="xs: string“ min. Occurs="0" max. Occurs="2"/> <xs: element name="Last. Name" type="xs: string"/> </xs: sequence> 24 April 2008 UNISCON 2008, Klagenfurt, Austria 6

Missing Modeling Constructs (1) 24 April 2008 UNISCON 2008, Klagenfurt, Austria 7

Missing Modeling Constructs (1) 24 April 2008 UNISCON 2008, Klagenfurt, Austria 7

Missing Modeling Constructs (2) n Choice structure n Parent concept n Choose one child

Missing Modeling Constructs (2) n Choice structure n Parent concept n Choose one child concept from several alternatives n Constrained recurrence of chosen child n Constrained recurrence of choice itself <xs: choice max. Occurs="2"> <xs: element name="Phone. Number" type="xs: string" min. Occurs="1" max. Occurs="2" /> <xs: element name="Email" type="xs: string"/> <xs: element name="Fax" type="xs: string"/> </xs: choice> 24 April 2008 UNISCON 2008, Klagenfurt, Austria 8

Missing Modeling Constructs (3) n Mixed attribute n Allows character and element data to

Missing Modeling Constructs (3) n Mixed attribute n Allows character and element data to be intertwined <xs: complex. Type mixed="true"> n Any and any. Attribute structures n Insert structures from other namespaces n Constrained recurrence <xs: any namespace="##other" min. Occurs="0"/> <xs: any. Attribute namespace="##any"/> 24 April 2008 UNISCON 2008, Klagenfurt, Austria 9

Missing Modeling Constructs (4) n Nesting of hierarchical structures n Key organizational characteristic of

Missing Modeling Constructs (4) n Nesting of hierarchical structures n Key organizational characteristic of XML n Arbitrarily complex nesting possible 24 April 2008 UNISCON 2008, Klagenfurt, Austria 10

C-XML Example 24 April 2008 UNISCON 2008, Klagenfurt, Austria 11

C-XML Example 24 April 2008 UNISCON 2008, Klagenfurt, Austria 11

C-XML TO XML SCHEMA 24 April 2008 UNISCON 2008, Klagenfurt, Austria 12

C-XML TO XML SCHEMA 24 April 2008 UNISCON 2008, Klagenfurt, Austria 12

C-XML <? xml version="1. 0" encoding="UTF-8"? > <xs: schema xmlns: xs="http: //www. w 3.

C-XML <? xml version="1. 0" encoding="UTF-8"? > <xs: schema xmlns: xs="http: //www. w 3. org/2001/XMLSchema" element. Form. Default="qualified"> <xs: element name="Root"> <xs: complex. Type> <xs: all> <xs: element ref="Students"/> <xs: element ref="Courses"/> <xs: element ref="Grad. Students"/> <xs: element ref="Undergrad. Students"/> </xs: all> </xs: complex. Type> <xs: keyref name="Undergrad. Student. OID-Keyref" refer="Student. OID-Key"> <xs: selector xpath=". /Undergrad. Students/Undergrad. Student"/> <xs: field xpath="@Undergrad. Student. OID"/> </xs: keyref> <xs: keyref name="Grad. Student. OID-Keyref" refer="Student. OID-Key"> <xs: selector xpath=". /Grad. Students/Grad. Student"/> <xs: field xpath="@Grad. Student. OID"/> </xs: keyref> </xs: element> <xs: element name="Students"> <xs: complex. Type> <xs: sequence> <xs: element name="Student" max. Occurs="unbounded"> <xs: complex. Type> <xs: sequence> <xs: choice min. Occurs="1" max. Occurs="1"> <xs: element name="Student. Name" type="xs: string"/> <xs: sequence> <xs: element name="First. Name" type="xs: string"/> <xs: element name="Middle. Names"> <xs: complex. Type> <xs: sequence> <xs: element name="Middle. Name" min. Occurs="0" max. Occurs="2"> <xs: complex. Type> <xs: attribute name="Middle. Name" type="xs: string" use="required"/> </xs: complex. Type> </xs: element> </xs: sequence> </xs: complex. Type> <xs: key name="Middle. Name-Key"> <xs: selector xpath=". /Middle. Name"/> <xs: field xpath="@Middle. Name"/> </xs: key> </xs: element> <xs: element name="Last. Name" type="xs: string"/> </xs: sequence> </xs: choice> <xs: element name="Semester-Course-Grades"> 13 <xs: complex. Type> <xs: sequence> XML Schema

Algorithm Overview Generate a forest of scheme trees n Translate an individual object set

Algorithm Overview Generate a forest of scheme trees n Translate an individual object set n Translate scheme-tree collections of object sets n Create a root node n Add uniqueness constraints n Translate generalization/specialization hierarchies n 14

Generate Scheme Trees (Student, Student. ID, Student. Name, First. Name, Last. Name, (Middle. Name)*,

Generate Scheme Trees (Student, Student. ID, Student. Name, First. Name, Last. Name, (Middle. Name)*, (Course, Semester, Grade)*)* 15

Generate Scheme Trees (Course, Department)* 16

Generate Scheme Trees (Course, Department)* 16

Generate Scheme Trees (Undergrad. Student)* (Grad. Student, Advisor)* 17

Generate Scheme Trees (Undergrad. Student)* (Grad. Student, Advisor)* 17

Generate Scheme Trees (Student, Student. ID, Student. Name, First. Name, Last. Name, (Middle. Name)*,

Generate Scheme Trees (Student, Student. ID, Student. Name, First. Name, Last. Name, (Middle. Name)*, (Course, Semester, Grade)*)* (Course, Department)* (Grad. Student, Advisor)* (Undergrad. Student)* 18

Generate Scheme Trees Student, Student. ID, Student. Name, First. Name, Last. Name Middle. Name

Generate Scheme Trees Student, Student. ID, Student. Name, First. Name, Last. Name Middle. Name Course, Department Course, Semester, Grade Grad. Student, Advisor Undergrad. Student (Student, Student. ID, Student. Name, First. Name, Last. Name, (Middle. Name)*, (Course, Semester, Grade)*)* (Course, Department)* (Grad. Student, Advisor)* (Undergrad. Student)* 19

Individual Object Sets <xs: attribute name="Department" type="xs: string"/> <xs: attribute name="Course" type="xs: string"/> <xs:

Individual Object Sets <xs: attribute name="Department" type="xs: string"/> <xs: attribute name="Course" type="xs: string"/> <xs: attribute ref="Course"/> <xs: element name="First. Name" type="xs: string"/> <xs: element name="Student"> <xs: complex. Type>. . . <xs: attribute name="Student. OID" type="xs: string" use="required"/> </xs: complex. Type> </xs: element> 20

Scheme-Tree Translation Middle. Names Students Middle. Name Course-Semester-Grades Course-Semester-Grade Courses Grad. Students Undergrad. Students

Scheme-Tree Translation Middle. Names Students Middle. Name Course-Semester-Grades Course-Semester-Grade Courses Grad. Students Undergrad. Students Course Grad. Student Undergrad. Student 21

Scheme-Tree Translation <xs: element name="Students"> <xs: complex. Type> <xs: sequence> <xs: element name="Student" max.

Scheme-Tree Translation <xs: element name="Students"> <xs: complex. Type> <xs: sequence> <xs: element name="Student" max. Occurs="unbounded"> <xs: complex. Type>. . . </complex. Type> </xs: element> </xs: sequence> </xs: complex. Type> </xs: element> <xs: element name="Semester-Course-Grades"> <xs: complex. Type> <xs: sequence> <xs: element name="Semester-Course-Grade" min. Occurs="0" max. Occurs="unbounded"> <xs: complex. Type>. . . </xs: complex. Type> </xs: element> </xs: sequence> </xs: complex. Type>. . . </xs: element> 22

Scheme-Tree Translation <xs: element name="Semester-Course-Grade" min. Occurs="0" max. Occurs="unbounded"> <xs: complex. Type> <xs: attribute

Scheme-Tree Translation <xs: element name="Semester-Course-Grade" min. Occurs="0" max. Occurs="unbounded"> <xs: complex. Type> <xs: attribute name="Semester" use="required"/> <xs: attribute ref="Course" use="required"/> <!-- C-XML: forall x (Course(x)=> exists [0: *] <x 1, x 2, x 3> (Course(x) Student(x 1) Semester(x 2) Grade(x 3) )) --> <xs: attribute name="Grade" type="xs: string" use="required"/> </xs: complex. Type> </xs: element> 23

24 24

24 24

Root Element Students Courses Grad. Students <xs: schema > <xs: element name="Root"> <xs: complex.

Root Element Students Courses Grad. Students <xs: schema > <xs: element name="Root"> <xs: complex. Type> <xs: all> <xs: element ref="Students"/> <xs: element ref="Courses"/> <xs: element ref="Grad. Students"/> <xs: element ref="Undergrad. Students"/> </xs: all> </xs: complex. Type>. . . </xs: element>. . . </xs: schema> Undergrad. Students 25

Uniqueness Constraints <xs: element name="Students"> <xs: complex. Type> <xs: sequence> <xs: element name="Student" max.

Uniqueness Constraints <xs: element name="Students"> <xs: complex. Type> <xs: sequence> <xs: element name="Student" max. Occurs="unbounded"> <xs: complex. Type>. . . </xs: complex. Type> </xs: element> </xs: sequence> </xs: complex. Type> <xs: key name="Student. OID-Key"> <xs: selector xpath=". /Student"/> <xs: field xpath="@Student. OID"/> </xs: key> <xs: key name="Student. ID-Key"> <xs: selector xpath=". /Student"/> <xs: field xpath="@Student. ID"/> </xs: key> </xs: element> 26

Generalization/Specialization <xs: keyref name="Undergrad. Student. OID-Keyref" refer="Student. OID-Key"> <xs: selector xpath=". /Undergrad. Students/Undergrad. Student"/>

Generalization/Specialization <xs: keyref name="Undergrad. Student. OID-Keyref" refer="Student. OID-Key"> <xs: selector xpath=". /Undergrad. Students/Undergrad. Student"/> <xs: field xpath="@Undergrad. Student. OID"/> </xs: keyref> <xs: keyref name="Grad. Student. OID-Keyref" refer="Student. OID-Key"> <xs: selector xpath=". /Grad. Students/Grad. Student"/> <xs: field xpath="@Grad. Student. OID"/> </xs: keyref> 27

XML SCHEMA TO C-XML 24 April 2008 UNISCON 2008, Klagenfurt, Austria 28

XML SCHEMA TO C-XML 24 April 2008 UNISCON 2008, Klagenfurt, Austria 28

XML Schema C- XML 29

XML Schema C- XML 29

Algorithm Overview Generate object sets for each element & attribute n Specify built-in and

Algorithm Overview Generate object sets for each element & attribute n Specify built-in and simple types in data frames n Obtain relationship sets from parent-child connections n Obtain participation constraints from min. Occurs, max. Occurs, and use constraints n 30

Attribute Transformation 31

Attribute Transformation 31

Element Transformation 32

Element Transformation 32

Choice Transformation 33

Choice Transformation 33

Sequence Transformation 34

Sequence Transformation 34

Key Constraints Transformation 35

Key Constraints Transformation 35

Substitution Group & Extension Transformation 36

Substitution Group & Extension Transformation 36

Observation on Transformations n These transformations to and from C-XML are not inverses of

Observation on Transformations n These transformations to and from C-XML are not inverses of one another n However, C-XML Schema C-XML 37

Demo 24 April 2008 UNISCON 2008, Klagenfurt, Austria 38

Demo 24 April 2008 UNISCON 2008, Klagenfurt, Austria 38

PROPERTY GUARANTEES 24 April 2008 UNISCON 2008, Klagenfurt, Austria 39

PROPERTY GUARANTEES 24 April 2008 UNISCON 2008, Klagenfurt, Austria 39

Transformation Properties: -XML to XML Schema C n Theorem 1: … preserves information. Proof:

Transformation Properties: -XML to XML Schema C n Theorem 1: … preserves information. Proof: injective n Theorem 2: Allowing for pragma constraints, … preserves constraints. Proof: by construction n Theorem 3: … yields an XML-Schema instance whose complying XML documents are redundancy free. Proof: [TKDE, Aug 06] 24 April 2008 UNISCON 2008, Klagenfurt, Austria 40

Transformation Properties: Schema to C-XML n Theorem 4: … preserves information. Proof: injective n

Transformation Properties: Schema to C-XML n Theorem 4: … preserves information. Proof: injective n Theorem 5: … preserves constraints. Proof: by construction 24 April 2008 UNISCON 2008, Klagenfurt, Austria 41

Conclusions n C-XML models XML conceptually n Transformations n C-XML to XML n Reverse-engineer

Conclusions n C-XML models XML conceptually n Transformations n C-XML to XML n Reverse-engineer XML to C-XML n Properties n Information preserving n Constraint preserving n Redundancy-free guarantee www. deg. byu. edu 24 April 2008 UNISCON 2008, Klagenfurt, Austria 42