CSE 636 Data Integration Overview Data Warehouse Architecture
- Slides: 22
CSE 636 Data Integration Overview
Data Warehouse Architecture Users Applications Relational Database (Warehouse) OLAP / Decision Support Data Cubes / Data Mining ETL Tools (Extract-Transform-Load) Data Cleaning Data Source 2
Virtual Integration Architecture • Leave the data in the sources • When a query comes in: – Determine the relevant sources to the query – Break down the query into sub-queries for the sources – Get the answers from the sources, filter them if needed and combine them appropriately • Data is fresh • Otherwise known as On Demand Integration 3
Virtual Integration Architecture Design-Time End Users Applications Global Schema Mappings Local Data Schema Sources can be: • Relational DBs • Excel Files • Web Sites • Web Services Local Data Schema Source 4
Schema Mappings • Differences in: – Names in schema – Attribute grouping Inventory Database A Inventory Database B Books Title ISBN Price Discount. Price Edition ISBN First. Name Last. Name Book. Categories ISBN Category Books. And. Music Title Author Publisher Item. ID Item. Type Suggested. Price Categories Keywords Authors CDCategories ASIN Category CDs Album ASIN Price Discount. Price Studio – Coverage of databases – Granularity and format of attributes Artists ASIN Artist. Name Group. Name 5
Issues for Schema Mappings Design-Time End Users Applications Global Schema Mappings Local Data Schema Source • What formalisms to express them? • How to create them? • Can we discover them somehow? • How do we use them? Local Data Schema Source 6
Virtual Integration Architecture Run-Time Query Result Mediator Reformulation Optimization Global Schema Execution Wrapper Local Data Schema Source 7
Issues for Query Processing Reformulation Query Mediator Reformulation Global Schema Local Data Schema Source • User queries refer to the global schema • Data is stored in the sources in a local schema • Rewriting algorithms Local Data Schema Source 8
Issues for Query Processing Reformulation Global Schema Books Title ISBN Price Discount. Price Edition SELECT ISBN, Price FROM Books WHERE Title = ‘on the road’ Local Schema A Books. And. Music Title Author Publisher Item. ID Item. Type Suggested. Price Categories Keywords SELECT Item. ID, Suggested. Price FROM Books. And. Music WHERE Title = ‘on the road’ AND Item. Type = ‘Books’ 9
Issues for Query Processing Query Translation Query • Different query languages Mediator Reformulation Optimization Global Schema Execution Wrapper Local Data Schema Source 10
Issues for Query Processing Query Translation Global Schema Books Title ISBN Price Discount. Price Edition SELECT ISBN, Price FROM Books WHERE Title = ‘on the road’ Local Source A http: //www. amazon. com/homepage. html? Item. Type=Books&Title=on+the+road 11
Issues for Query Processing Data Translation Query • Different data models Mediator Reformulation Optimization Global Schema Execution Wrapper Local Data Schema Source 12
Issues for Query Processing Data Translation Global Schema Books Title ISBN Price Discount. Price Edition Title ISBN Price On the Road 123 10. 86 Local Result A … … <table> <tr> <td> <a href=/details? isbn=123> <b>On the Road</b> </a> -- by Jack Kerouac; Paperback <a href=/details? isbn=123> Buy new </a> : <b class=price>$10. 86</b> </td> </tr> </table> 13
Issues for Query Processing Query Execution Query Mediator Reformulation Optimization Global Schema Execution Wrapper Local Data Schema Source • Access as many data sources as needed • Duplicate/redundant and irrelevant data • Limited query capabilities Wrapper Local Data Schema Source 14
Issues for Query Processing Limited Query Capabilities Global Schema Books Title E ISBN Price 123 10. 86 Discount. Price Edition SELECT ISBN, Price, Discount. Price FROM Books WHERE Title = ‘on the road’ Discount. Price 8. 86 Local Schema B Local Schema A Discount. Books. And. Music Title B Item. ID Author Suggested. Price Item. ID 123 Item. Type 10. 86 Suggested. Price A SELECT Item. ID, Suggested. Price FROM Books. And. Music WHERE Title = ‘on ? the road’ Title D Great. Price Edition ISBN 8. 86 Great. Price C SELECT Great. Price FROM Discount. Books WHERE ISBN = 123 ? 15
Issues for Query Processing Query Answering Query Result Mediator Reformulation Optimization Global Schema Execution Wrapper Local Data Schema Source • Combine the results and further process them if needed • Mainly union and merge • Inconsistencies Wrapper Local Data Schema Source 16
Issues for Query Processing Query Answering (Union) ISBN Price 123 10. 86 456 8. 86 Item. ID Suggested. Price ISBN Great. Price 123 10. 86 456 8. 86 17
Issues for Query Processing Query Answering (Merge) Primary Key ISBN Title Edition Price 123 On the Road 2 nd 8. 86 Item. ID Title ISBN Edition Price 123 On the Road 123 2 nd 8. 86 Primary Key 18
Issues for Query Processing Query Answering (Inconsistencies) Primary Key ISBN Title Edition Price 123 On the Road ? ? ? 8. 86 Item. ID Title Edition ISBN Edition Price 123 On the Road 1 st 123 2 nd 8. 86 Primary Key 19
Peer-Based Integration Query Peer 4 Query Peer 5 Peer 2 Peer 1 Peer 3 21
Peer-Based Integration • • No need for a central mediated schema Peers serve as mediators for other peers A peer can be both a server and a client Semantic relationships are specified locally (between small sets of peers) • Queries are posed using the peer’s schema • Answers come from anywhere in the system • This is not P 2 P file sharing. – Data has rich semantics 22
References • Information integration – Maurizio Lenzerini – Eighteenth International Joint Conference on Artificial Intelligence, IJCAI 2003 – Invited Tutorial • Data Integration: a Status Report – Alon Halevy – German Database Conference (BTW), 2003 – Invited Talk 23
- Independent data mart architecture
- Dewey 636
- Manpower reserve
- Data warehouse continuous integration
- Data warehouse integration services
- Dwh architecture
- Technical architecture data warehouse
- Two tier data warehouse architecture
- Generic two level data warehouse architecture
- Apache hive data warehouse
- Data warehouse best practice architecture
- Virtual data integration architecture
- Data quality and data cleaning an overview
- Data quality and data cleaning an overview
- Data quality and data cleaning an overview
- Onap architecture
- Stylistic overview of philippine art
- Gsm system architecture
- Overview of grid architecture
- Overview of oracle architecture
- Data mining in data warehouse
- Contoh data mart
- Data staging component in data warehouse