Structural and Semantic Heterogeneity in Database Schema Integration
Structural and Semantic Heterogeneity in Database Schema Integration SIXTH Conference of Department of Computing Wednesday 4 May 2005 David George
Presentation Content n Why is Integration necessary? n Evolution of Integration Approaches n n Barriers to Integration - Structural and Semantic Heterogeneity New opportunity - Ontology and the Semantic Web
Why Database Integration?
Drivers for Data Integration n Global organisations with distributed data. n Organisations having legacy and new databases. n Organisational change e. g. business re-engineering and acquisitions. n n Autonomous departments with disconnected systems requiring interoperability e. g. Financial Services. Business Intelligence requiring: n decision-support systems. n customer analysis and marketing strategies. n data mining
Schema Integration Local DB schema Global schema integration Queries Global Schema 1 Schema 2 Schema n Query input Global: Local Schema Mapping Query output
Evolution in Integration Approaches
Knowledge Evolution in Integrations Global Domain Agreements Digital media Visual/Spatial/Temporal Data [Kiosk/Geographic/Flights/Forecasting] Focus – Semantics Domain-specific Information Structured, Semi-structured Text repositories Focus - Syntax of data type, format & Schema constructs Data Structured DBs, Files System Local Task Schemas Focus – Systems & Communications Schema Integration Common Data Models Federated DBS Virtual Integration Single Ontologies Federated IS (inc Mediators) 1985 Multiple ontologies, Inter-ontological Information Brokering 1995
Federated DBMS approach External Schema 1. 1 Common Data Model External Schema 1. 2 External Schema 2. 1 Federated Schema 1 Export Schema 1. 1 Federated Schema 2 Export Schema 2. 1 Export Schema 2. 2 Component Schema 1 Component Schema 2 Local Schema 1 Local Schema 2 Component DBS 1 Component DBS 2 etc Application: Integration of business databases
FDBMS schema architecture External Schema 1. 1 Common Data Model External Schema 1. 2 External Schema 2. 1 Federated Schema 1 Export Schema 1. 1 External Schema 2. 2 Federated Schema 2 Export Schema 2. 1 Export Schema 2. 2 Export Schema 3. 1 Component Schema 2 Component Schema 3 Local Schema 1 Local Schema 2 Local Schema 3 Component DBS 1 Component DBS 2 Component DBS 3
Mediator/Wrapper (Virtual integration) Network Internet Query Translation Mediator Application: Integrated access to Heterogeneous data
Information Brokering Search Query: “Find detached houses for sale under £ 300 k with 2 bathrooms, 3 bedrooms, a local school rated in the upper quartile of govt. league tables, in a district with below-average crime rate and a socio-economically diverse population? ” Multiple Worlds Information Mediation Property Sales Crime Statistics School Rankings Demographics
Barriers to Integration - Structural & Semantic Heterogeneity
Recipe for Heterogeneity and Conflict Conceptualisations of the real world are influenced by the designers view of the Concept and Context to be modelled
Schema Type Conflicts Name Publisher Address Title Pub-Book Name Book-Topic Title Code Publisher Topics Title Publication Pub-Keyword Code Research Area
Taxonomy of Schema Conflicts Entity Definition Conflicts n n Naming conflicts (Synonyms and Homonyms) DB Identifier conflicts e. g. ID# vs. Name Schema isomorphism at attribute level (e. g. mapping of telephone. vs. Home. Tel + Work. Tel) Missing Attributes Domain Definition Conflicts n n Naming conflicts n Data Representation (Integer vs. String) n Data Dimensions (volume, weight, price, number) n Dimension Measures (based on above) n Data Scaling ( £K, £M) n Data Precision (1 -100 vs. A-E) n Data Value Conflicts Attribute Integrity Constraints (cardinality, uniqueness, nulls) Known Inconsistency (has errors, presence/absence) Temporal Inconsistency (last update) Acceptable Inconsistency (within a range)
Incoherence in Cardinality Invoice 1 Inv: Order Invoice 1 Invoice n Inv: Order 1 m m Order
Abstraction and Schematic Conflicts Abstraction Level Conflicts Schematic Discrepancies n Generalisation/Specialisation n Data Value to Attribute n Aggregation/ Decomposition n Attribute to Entity n Data Value to Entity
Generalisation/Specialisation Conflicts Schema 1 Schema 2 Student (ID#, Name, Type, Course) S_Type U-graduate (ID#, Name, Course) Graduate (ID#, Name, Course) i. e. U-graduate in schema 2 represented at more general level in schema 1
Specialisation Classification Conflicts Employee Gender Role Person <30 30 -60 Adult Sex >60 <25 25 -55 >55 Customer Characterisation inconsistency Senior Service Person Degrees inconsistency Customer Child Employee Criteria inconsistency Teen Parent G-Parent
Aggregation Conflicts Aggregation used in schema 1 is represented by a set-of entities in schema 2 Also NB: mapping exists in only one direction Schema 1 Convoy Schema 2 Ship (ID#, Av_Weight, Location) (ID#, Weight, Location, Captain)
Aggregation Conflicts (contd) Component class of collection Employee(department) vs. Employee(division(department)) Aggregation Specialisation Car. Type(car. Make, car. Design) vs. Family. Type(car. Make, saloon. Size) Aggregation Composition Person(address, tel) vs. Person(street, city, county, tel)
Schematic Discrepancies Data: Attribute: Entity conflicts Stock DB 1 (Date, Stock. Code, Close. Price) Value (stock. Item) Stock DB 2 (Date, Stock. Item 1, Stock. Item 2, …Stock. Itemn) (Close. Price) Attribute Stock. Item 1 DB 3 (Date, Close. Price) Stock. Itemn DB 3 (Date, Close. Price). . …. . Entity
So where next?
New Solutions - Ontologies and the Semantic Web
Ontologies in Computing § Formal vocabulary of a “universe of discourse”. § Ontologies define: q concepts and their attributes q relationships between concepts q constraints on those relationships “An Ontology is a formal, explicit specification of a shared conceptualization” (Gruber, 1993 & Borst, 1997)
Bibliographic Data Ontology (extract) Biblio-Thing Agent Document Person Author Organization Book Miscellaneous-Publication Publisher University Proceedings Edited-Book Thesis Periodical-Publication Cartographic-Map Doctoral-Thesis Journal Technical-Manual Computer-Program Newspaper Magazine Master-Thesis http: //www. ksl. stanford. edu/knowledge-sharing/ontologies/html/
Types of “ontologies” • DE BRUIJN, J. (2003) Using Ontologies - Enabling Knowledge Sharing and Reuse on the Semantic Web [online]. Innsbruck, Austria, DERI – Digital Enterprise Research Institute. Available from: http: //www. deri. ie/publications/techpapers/documents/DERI-TR-2003 -10 -29. pdf. [Accessed 15 February 2005]. • Value restrictions: values of properties are restricted (e. g. by a datatype). • General logic constraints: values may be constrained by using values from other properties. • First-order logic constraints: very expressive constraints between relationships such as: disjoint classes, inverse relationships, part-whole relationships.
Semantic Web DE BRUIJN, J. (2003) Using Ontologies - Enabling Knowledge Sharing and Reuse on the Semantic Web [online]. Innsbruck, Austria, DERI – Digital Enterprise Research Institute. Available from: http: //www. deri. ie/publications/techpapers/documents/DERI-TR-2003 -10 -29. pdf. [Accessed 15 February 2005].
Semantic Web Tower OWL: Clients in S 1 same-As Customers in S 2 OWL Ontology Language RDFS: person X is a Living. Person RDF: person X is named “Bill". “The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation” Tim Berners-Lee et al. , 2001
RDF Example Object, Attribute, Value Triple: Predicate, subject, object
End of Presentation
Semantic Data Model
Knowledge I n t e r o p e r a b i l i t y Information Data Evolution in Interoperability Understanding comprehensive metadata and ontology approaches Digital media Visual/Spacio-Temporal Modelling Scientific/Engineering Key focus on: Semantics & more domain-specific Structured, Semi-structured (HTML etc) Text repositories O-O sys Structured DBs, Files Key focus: Systems & Comms. Local Schema Multi-modal sys Understanding use of metadata & schematic heterogeneities Key focus on: Syntax – data types/format Structure – schema constructs System Global Domain E-R sys Common Data Models Schema translation & Integration MDBMS / Federated DBS Schematic & metadata relationships, Wrappers, Single Ontologies Multiple ontologies, Inter-ontological, Metadata standards Fed. Inf. Systems / Mediators Mediator / Information Brokering 1985 1995 Architectures
- Slides: 33