Overview of Database Federation and IBM Garlic Project
Overview of Database Federation and IBM Garlic Project Presented by Xiaofen He 1
Reference n Data Integration through database federation, L. M. Haas, E. T. Lin, M. A. Roth n Towards Heterogeneous Multimedia Information Systems: The Garlic Approach, IBM Almaden Research Center 2
Outline n n n Approaches to data integration Database Federation in IBM DB 2 IBM Garlic Project 3
Various Approaches to Data Integration (1) n Application-specific solutions n n n Application-integration frameworks n n n Always works Expensive, fragile and hard to extend Protection from changes of data source Do not address data integration issues Workflow frameworks n Limited support for comparing and manipulating 4
Various Approaches to Data Integration (2) n Digital libraries n n n Data warehousing n n n Meta search engine No combination of data Powerful, high-level query language May not be possible or cost effective, loss of functionality Database federation n n Virtual data warehouse Performance tradeoff (query rewrite & cost-based optimization) 5
Database Federation n Basics of Database Federation DB 2 styles of database federation Determining the style of database federation to use 6
Basics of Database Federation n What is ‘database federation’ (DF) n n Aka. ‘mediation’ An architecture in which middleware, consisting of a relational database management system, provides uniform access to a number of heterogeneous data sources 7
Common Mediation Architecture n n n Data Source Wrapper Mediator Figure 1. Common Mediator Architecture 8
Goals of IBM DF n n n n Transparency Support heterogeneity A high degree of function Extensibility Openness Autonomy of individual data sources Query optimization 9
DB 2 architecture for DF Figure 2. DB 2 architecture for database Federation 10
DB 2 Styles of federation n Scalar UDFs: Federating function Table UDFs: Federating data Wrappers: Federating function and data Figure 3. Different styles of federation 11
Wrapper Architecture n n Multi-server integration Multi-dataset integration and multioperation integration Optimization Transactional integration 12
Determining the style of DF to use Figure 4. Determine the style of federation to use 13
IBM Garlic Project n n Introduction Overview n n n Architecture Repositories and Databases The Garlic Data Model Queries in Garlic Interface and Application Conclusion 14
Introduction n Need Goal Object-Oriented Model 15
Garlic Overview C++ Application Query/Browser Query Services & Runtime System Metadata Repository Wrapper Repository Wrapper Complex Object Repository Data Repository Figure 5. Garlic System Architecture 16
Garlic Overview n Repositories n n Repository type Repository instance Repository manager Databases n n Global schema Wrapper schemas (local schemas) 17
Garlic Data Model (1) n ODMG-93 object model n n n Object identity n n Objects and values Inheritance Weak identity – unique, not necessarily immutable Legacy references n Implementation-constrained reference 18
Garlic Data Model (2) n Extensions n n Degree of support for alternative implementations of interfaces Type system flexibility - conformity Object-appropriate view definition facility Object-Centered Views n Enhance objects by adding or hiding some of their attributes/methods. 19
Queries in Garlic n Query language n n n Query Processing n n Object-oriented extension of SQL Integrating approximate match query semantics with traditional exact match query semantics. Decomposition Interesting Question n How to characterize the query power of a repository, in terms of the language subset that its wrapper is capable of processing directly 20
Interfaces and Applications n C++ API n n n Compiled applications Dynamic applications Query/Browser n n A dynamic application Moving back and forth between querying and browsing activities 21
Summary n Database Federation n n A powerful tool for integrating data Future work n n n to improve the ease of use Enhance the performance Garlic Project n New research in many dimensions 22
- Slides: 22