DATA QUALITY PROBLEMS AND THEIR ROOT CAUSES DAMA
DATA QUALITY PROBLEMS AND THEIR ROOT CAUSES DAMA COLUMBUS, OH CHAPTER MEETING – JANUARY 2015
Consequences of poor Data Quality Ø Have multiple names for the same supplier and items in our system, cannot roll up spend by supplier by item correctly –this leads to an inability to shrink our supplier base and negotiate more volume discounts and control spend Ø Cannot identify if the same customer has auto , property and life insurance with our organization which limits our ability to cross-sell/upsell to them Ø We are losing millions of dollars in postage and collateral costs by mailing to customers who live in the same household 2
Consequences of poor Data Quality Ø Unable to identify active accounts receivables for the same B 2 B customer across our different lines of business. Customers with an outstanding balance of $500, 000 are required to have their credit reviewed per SEC guidelines. Ø Need to measure risk exposure by top 5 customer accounts across the organization from a credit perspective. The biggest challenge is figuring out parent child relationships across affiliates Ø Have lost millions of dollars in procuring suppliers because we are not able to correctly identify and reconcile suppliers on a global basis 3
Examples of Data Quality Problems • Data glitches Ø Typos, multiple formats, multiple scales , missing/default values • Business logic embedded in natural keys • Data values that do not conform to the business rules • Information buried in free-form/flex fields • Mistakes, misspellings, incorrect data types, lack of standards etc. 4
Traditional Definition of Data Quality • Accuracy: Does the data accurately represent reality or a verifiable source? • Integrity: Is the structure of data and relationships among entities and attributes maintained consistently? Ø Policy data that is not tied to a valid customer • Consistency: Are data elements consistently defined and understood? 5
Example of Data Inconsistency • Accuracy: Does the data accurately represent reality or a verifiable source? • Integrity: Is the structure of data and relationships among entities and attributes maintained consistently? Ø Policy data that is not tied to a valid customer • Consistency: Are data elements consistently defined and understood? 6
Traditional Definition of Data Quality • Completeness: Is all necessary data present? Ø Field three is Revenue § In dollars or cents? § In dollars or euros? Ø Field four is Product Sales § In units or cases? 7
Traditional Definition of Data Quality • Validity: Do data values fall within acceptable ranges defined by the business? • Timeliness: Is data available when needed? • Accessibility: Is the data easily accessible, understandable, and usable? • Relevance 8
Another perspective on Data Quality When the quality of data is sufficient to support the business purpose for which it needs to be used , either by people or applications 9
Major Causes for Poor Data Quality • Poorly designed business processes • No central management of business processes • No ownership or stewardship for data • Weakness of tool(s) used to manage business processes • Multiple instances of a central tool for managing business processes 10
Major Causes for Poor Data Quality • Third party interfaces • Data conversion from legacy systems • Human error • Lack of sufficient training on tools used to manage business processes 11
Case Study of a Manufacturing Organization • Example of a case study with multiple heterogeneous ERP systems used by different divisions in the organization • Centralized ERP system used to share Master Data • Master Data published in a “one-to-many” way to other ERP systems • Comprehensive documentation on the usage of centralized ERP system and acceptable domain values 12
Analysis of Case Study • No specification on “who” had to enter the data “when” • Same business task supported differently by divisional ERP instances • Considerable lag between the physical presence of incoming goods and their visibility in the ERP system 13
Analysis of Case Study (Continued. . ) • Multiple customizations of divisional ERP systems • Weakness of GUI of central ERP system used for master data entry • Central ERP system did not prevent duplicate entries form being made • 14
Analysis of Case Study (Continued. . ) • Weak search engine in ERP system • Difficult to correct errors in ERP system • Data conversion from legacy applications 15
Lessons Learnt • Processing delays, divisional idiosyncrasies or operational errors cause data quality issues • Business processes must be both aligned with the organization and oriented towards the customer 16
Lessons Learnt (Continued. . ) • Multiple instances of the same tool customized by different persons from various institutions with divergent interests or work standards can cause issues • Multiple instances of the same tool also cause issues due to lack of an enterprise-wide view 17
How to Prevent Data Quality Issues • Best practices for handling separate ways to enter data in an IT system Ø Declare one application as “reference” or “master” system Ø Develop a set of common definitions and procedures 18
Example of Sample Governance Process 19
How to Prevent Data Quality Issues (Cont. . ) • Avoid bias in producing data • Avoid distributed architectures 20
How to Prevent Data Quality Issues (Cont. . ) • Assign responsibility for data quality issues 21
Typical Data Governance Org Chart 22
How to Prevent Data Quality Issues (Cont. . ) • Design Information Chains First 23
Questions? • Contact me at : nmohanty@navmp. com 24
References • Presentation on Data Quality and Data Cleaning from Rutgers University • http: //dataqualityaccuracy. blogspot. com/ 25
- Slides: 25