Doctoral Consortium Solving Data Inconsistencies and Data Integration

  • Slides: 23
Download presentation
Doctoral Consortium “Solving Data Inconsistencies and Data Integration with a Data Quality Manager” Presented

Doctoral Consortium “Solving Data Inconsistencies and Data Integration with a Data Quality Manager” Presented by Maria del Pilar Angeles, Lachlan M. Mac. Kinnon School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh, EH 14 4 AS {pilar, lachlan}@macs. hw. ac. uk

Agenda • Introduction • Proposal • Data Quality Manager Components – Reference Model –

Agenda • Introduction • Proposal • Data Quality Manager Components – Reference Model – Measure Model – Assessment Model – Quality Metadata • Information Integration Process – Classification of Data. Sources – Selection of Best Datasources – Query Planning – Data Fusion – Ranking of Query results • Questions Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, Mac. Kinnon Lachlan M. 2

Introduction Domain definition Entity definition Structural Conflicts (Sheth 92) Data Value Abstract Schematic discrepancy

Introduction Domain definition Entity definition Structural Conflicts (Sheth 92) Data Value Abstract Schematic discrepancy Naming Data Representation Data scaling Data Precision Default value Attribute integrity constraints Database id Naming Union compatibility Schema isomorphism Missing data item Known inconsistency Temporal inconsistency Acceptable inconsistency Generalization Aggregation Data value attribute Attribute entity Data value entity Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, Mac. Kinnon Lachlan M. Approached by Ontology Metadata Transformation rules Mapping 3

Introduction DS 1 DS 2 DS 3 Emp_no Name salary SSN fullname sal 123987

Introduction DS 1 DS 2 DS 3 Emp_no Name salary SSN fullname sal 123987 Alastair Freich 14000 123987 A. Freich 20000 456339 Fernando Lujan NULL 789222 Fiona Shaning 15000 Employee_number Full_name_employee Salary 123987 Alastair F. 14000 123987 A. Freich 20000 123987 Al. Freich NULL 456339 Fernando Lujan NULL 393765 Lauren Mac. Millan 14500 789222 Fiona Shaning 15000 employe SFE salary 123987 Al. Freich NULL 393765 Lauren Mac. Millan 14500 Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, Mac. Kinnon Lachlan M. 4

Proposal We propose the development of a Data Quality Manager (DQM) to establish communication

Proposal We propose the development of a Data Quality Manager (DQM) to establish communication between the process of integration of information, the user and the application, to deal with semantic heterogeneity. Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, Mac. Kinnon Lachlan M. 5

Proposal Local User 1 Local User 2 Local Schema 1 Wrapper Local User N

Proposal Local User 1 Local User 2 Local Schema 1 Wrapper Local User N Local Schema 2 Data Source 1 Data … Source 2 Wrapper Export Schema 1 Local Schema N Data Source N Wrapper Export Schema 2 Export Schema N Mediator Data Quality Manager Applications Global Schema Global User 1 Global User 2 Global User 3 … Global User M Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, Mac. Kinnon Lachlan M. 6

DQM Components • Definition of Quality Criteria Reference Model Solving Data Inconsistencies and Data

DQM Components • Definition of Quality Criteria Reference Model Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, Mac. Kinnon Lachlan M. 7

DQM Components • Definition of Quality Criteria • Definition of Metrics Measurement Model Reference

DQM Components • Definition of Quality Criteria • Definition of Metrics Measurement Model Reference Model Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, Mac. Kinnon Lachlan M. 8

DQM Components • Definition of Quality Criteria • Definition of Metrics • Definition of

DQM Components • Definition of Quality Criteria • Definition of Metrics • Definition of Assessment methods Assessment Model Measurement Model Reference Model Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, Mac. Kinnon Lachlan M. 9

DQM Components • Definition of Quality Criteria • Definition of Metrics Quality Metadata •

DQM Components • Definition of Quality Criteria • Definition of Metrics Quality Metadata • Definition of Assessment methods Assessment Model • Definition of Quality Metadata (QMD) Measurement Model Reference Model Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, Mac. Kinnon Lachlan M. 10

QMD Population Based on DQM components, classify the data sources Completeness # incomplete #

QMD Population Based on DQM components, classify the data sources Completeness # incomplete # total # errors # total Age + delivery time – input time Accuracy Currency Survey, Queries, benchmarks QMD DQM: Data Quality Manager QMD: Quality Meta Data Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, Mac. Kinnon Lachlan M. 11

Information Integration Process Data Quality Manager Selection of Best Data Sources Solving Data Inconsistencies

Information Integration Process Data Quality Manager Selection of Best Data Sources Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, Mac. Kinnon Lachlan M. 12

Information Integration Process Data Quality Manager Query Planning Selection of Best Data Sources Solving

Information Integration Process Data Quality Manager Query Planning Selection of Best Data Sources Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, Mac. Kinnon Lachlan M. 13

Information Integration Process Data Quality Fusion of Data Inconsistencies Manager Query Planning Selection of

Information Integration Process Data Quality Fusion of Data Inconsistencies Manager Query Planning Selection of Best Data Sources Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, Mac. Kinnon Lachlan M. 14

Information Integration Process Data Query Integration Quality Fusion of Data Inconsistencies Manager Query Planning

Information Integration Process Data Query Integration Quality Fusion of Data Inconsistencies Manager Query Planning Selection of Best Data Sources Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, Mac. Kinnon Lachlan M. 15

Information Integration Process Ranking of Data Query results Query Integration Quality Fusion of Data

Information Integration Process Ranking of Data Query results Query Integration Quality Fusion of Data Inconsistencies Manager Query Planning Selection of Best Data Sources Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, Mac. Kinnon Lachlan M. 16

Selection of best Data Sources Quality User Priorities User Query Mapping Local/Global Schemas Data

Selection of best Data Sources Quality User Priorities User Query Mapping Local/Global Schemas Data sources Involved in the Query Ranking of best Data Sources 1 4 2 3 QMD 1. The Quality user priorities are given by the user. 2. The ranking of best data sources involved in the query is given before execution Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, Mac. Kinnon Lachlan M. 17

Query Planning Quality User Priorities Query. A User Query Partition Query. B Top ranking

Query Planning Quality User Priorities Query. A User Query Partition Query. B Top ranking Query Plan Query. C QMD Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, Mac. Kinnon Lachlan M. Plan 1 Plan 2 Plan 3. Plan N 18

Data Fusion Quality user priorities Result. X Execute Query Plan Result. Y Data Inconsistencies

Data Fusion Quality user priorities Result. X Execute Query Plan Result. Y Data Inconsistencies Detection Inconsistent Query Result Data fusion Result. Z Consistent Query Result QMD As in the DQM is stored where data comes from, it is possible to make decisions at data fusion time. Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, Mac. Kinnon Lachlan M. 19

Ranking Query Result. J Data Fusion Quality user priorities Result. K Result. L Consistent

Ranking Query Result. J Data Fusion Quality user priorities Result. K Result. L Consistent Query Result Query Integration Query Result Ranking QMD Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, Mac. Kinnon Lachlan M. 20

Conclusion Using Data Quality Manager we can. . • Approach data value level inconsistencies

Conclusion Using Data Quality Manager we can. . • Approach data value level inconsistencies during Information Integration Process, using data quality properties. • User may demand different quality priorities at query time. • Manage user quality priorities AND data quality properties to give the expected quality query result by the user. What we need to do now…. Identify tools for measurement, assessment and develop a QMD. Store quality of data sources involved in the heterogeneous system. Identify techniques for Ranking of data sources and plans involved in the query Inconsistency detection Fusion data using data source and data level properties Ranking of query results. Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, Mac. Kinnon Lachlan M. 21

Questions? Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria

Questions? Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, Mac. Kinnon Lachlan M. 22

Thanks !! Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles

Thanks !! Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, Mac. Kinnon Lachlan M. 23