Chapter 1 Introduction to Data Quality Data Quality

  • Slides: 12
Download presentation
Chapter 1 Introduction to Data Quality

Chapter 1 Introduction to Data Quality

Data Quality Characteristics Data quality affects several attributes associated with data: n Accuracy –

Data Quality Characteristics Data quality affects several attributes associated with data: n Accuracy – Is it realistic or believable? n Integrity – Is it structured and managed? n Consistency – Is it consistently defined and maintained? n Validity – Is the data valid, based on business or industry rules and standards?

What Causes Poor Data Quality? These factors can contribute to poor data quality: n

What Causes Poor Data Quality? These factors can contribute to poor data quality: n Business rules do not exist or there are no standards for data capture. n Standards may exist but are not enforced at the point of data capture. n Inconsistent data entry (incorrect spelling, use of nicknames, middle names, or aliases) occurs. n Data entry mistakes (character transposition, misspellings, and so on) happen. n Integration of data from systems with different data standards is present. n Data quality issues are perceived as time-consuming and expensive to fix.

Primary Sources of Data Quality Proble Source: The Data Warehousing Institute, Data Quality and

Primary Sources of Data Quality Proble Source: The Data Warehousing Institute, Data Quality and the Bottom Line, 2002

How Is Clean Data Achieved? Clean data is the result of a combination of

How Is Clean Data Achieved? Clean data is the result of a combination of efforts: n making sure that data entered into the system is clean n cleaning up problems after the data is accepted.

Typical Data Quality Issues The most common processes in a data quality initiative are

Typical Data Quality Issues The most common processes in a data quality initiative are n Data Analysis and Standardization – consistency analysis – standardization schemes – gender analysis – entity analysis – data parsing and casing. continued. . .

Typical Data Quality Issues The most common processes in a data quality initiative are

Typical Data Quality Issues The most common processes in a data quality initiative are n Matching and Merging – de-duplication – householding n Address Verification – against a CASS certified database n Geocoding – data enrichment using third-party data elements.

Analysis and Standardization Example Who is the biggest supplier? Anderson Construction $ 2, 333.

Analysis and Standardization Example Who is the biggest supplier? Anderson Construction $ 2, 333. 50 Briggs, Inc $ 8, 200. 10 Brigs Inc. $12, 900. 79 Casper Corp. $27, 191. 05 Caspar Corp $ 6, 000. 00 Solomon Industries $43, 150. 00 The Casper Corp $11, 500. 00 . . .

Standardization Scheme Briggs, Inc Brigs Inc. Casper Corp. Caspar Corp The Casper Corp Briggs

Standardization Scheme Briggs, Inc Brigs Inc. Casper Corp. Caspar Corp The Casper Corp Briggs Inc. Casper Corp. . . .

Supplier Spending 50, 000 Casper Corp. 40, 000 30, 000 Solomon Ind. 20, 000

Supplier Spending 50, 000 Casper Corp. 40, 000 30, 000 Solomon Ind. 20, 000 Briggs Inc. 10, 000 Anderson Cons. 0 $ Spent

Data Matching Example Operational System of Records Data Warehouse Mark Carver SAS Campus Drive

Data Matching Example Operational System of Records Data Warehouse Mark Carver SAS Campus Drive Cary, N. C. 01 Mark Carver SAS Campus Drive Cary, N. C. Mark W. Craver 02 Mark W. Craver Mark. Craver@sas. com 03 Mark Craver Systems Engineer SAS . . .

Data Quality Process Operational System of Records Data Warehouse Mark Carver SAS Campus Drive

Data Quality Process Operational System of Records Data Warehouse Mark Carver SAS Campus Drive Cary, N. C. Mark W. Craver Mark. Craver@sas. com DQ 01 Mark Craver Systems Engineer SAS Campus Drive Cary, N. C. 27513 Mark. Craver@sas. com Mark Craver Systems Engineer SAS . . .