Module 6 Data Quality Assurance and Quality Control

  • Slides: 17
Download presentation
Module 6 Data Quality Assurance and Quality Control (QA/QC)

Module 6 Data Quality Assurance and Quality Control (QA/QC)

QA/QC Topics Definitions ◦ ◦ ◦ Quality assurance Quality control Data contamination Error Types

QA/QC Topics Definitions ◦ ◦ ◦ Quality assurance Quality control Data contamination Error Types Error Handling QA/QC best practices ◦ Before data collection ◦ During data collection/entry ◦ After data collection/entry Data Management Plans

Learning Objectives After completing this lesson, the participant will be able to: ◦ Define

Learning Objectives After completing this lesson, the participant will be able to: ◦ Define data quality control ◦ Define data quality assurance ◦ Perform quality control and assurance on their data at all stages of the research cycle (before data collection, during data collection/entry, and afterward) Data Management Plans

The Data Life Cycle Collect Assure Analyze Integrate Managing the quality of data during

The Data Life Cycle Collect Assure Analyze Integrate Managing the quality of data during data collection and entry is important, but data quality is monitored and managed throughout the data life cycle Discover Data Management Plans Describe Deposit Preserve

Definition: Data Contamination Process or phenomenon, other than the one of interest, that affects

Definition: Data Contamination Process or phenomenon, other than the one of interest, that affects the variable value Erroneous values Data Management Plans

Types of Errors of Commission ◦ Incorrect or inaccurate data ◦ Causes: malfunctioning instrument,

Types of Errors of Commission ◦ Incorrect or inaccurate data ◦ Causes: malfunctioning instrument, mistyped data Errors of Omission ◦ Data or metadata not recorded ◦ Incomplete data record ◦ Causes: inadequate documentation, human error, anomalies in the field Logic Errors ◦ Improbable values or value-combinations ◦ Usually detected during data review by a human who is a subject matter expert Data Management Plans

Definition: Quality Assurance & Quality Control Preventing bad data from contaminating a data set

Definition: Quality Assurance & Quality Control Preventing bad data from contaminating a data set Quality assurance ◦ Activities that ensure quality of data before collection ◦ Verifying the quality of data obtained from others before use Quality control ◦ Monitoring and maintaining the quality of data during the research life cycle Data Management Plans

QA/QC Before Collection Create a suitable structure to store the data ◦ Database or

QA/QC Before Collection Create a suitable structure to store the data ◦ Database or well-designed spreadsheet Define & enforce standards ◦ ◦ Metadata Formats Codes Measurement units Assign responsibility for data quality ◦ Be sure assigned person is educated in QA/QC Data Management Plans

QA/QC During Data Entry Double-entry ◦ Data keyed in by two independent people and

QA/QC During Data Entry Double-entry ◦ Data keyed in by two independent people and then checked for agreement with computer verification Use text-to-speech program to read data back ◦ Serves as a ‘second person’ to help when one is not available Use a properly designed database ◦ Atomize data: each value is stored (changed) in only one place ◦ Minimize errors using column, row, and relationship validation ◦ Use consistent terminology Document all changes to data ◦ Avoids duplicate error checking ◦ Allows undo if necessary Data Management Plans

QA/QC After Data Entry Data Review and Certification for Use It is important to

QA/QC After Data Entry Data Review and Certification for Use It is important to review the data for quality and to certify it for use. Certification allows others to use the data knowing it meets a predetermined level of quality and completeness. Errors found need to be corrected, with documentation of the correction activity annotated on original data sheets. A person familiar with the kind of data being reviewed is essential, because some errors are cryptic and require recognition of a logical inconsistency (ex: incorrect equipment indicated for a particular type of parameter measured) Data Management Plans

QA/QC After Data Entry Example of Illegal Data Filter Table 1 from Edwards (2000).

QA/QC After Data Entry Example of Illegal Data Filter Table 1 from Edwards (2000). An illegal-data filter, written in SAS (the data set "All" exists prior to this DATA step, containing the data to be filtered, variable names Y 1, Y 2, etc. , and an observation identifier variable ID). Data Checkum; Set All; message=repeat(" ", 39); If Y 1<0 or Y 1>1 then do; message="Y 1 is not on the interval [0, 1]"; output; end; If Floor(Y 2) NE Y 2 then do; message="Y 2 is not an integer"; output; end; If Y 3>Y 4 then do; message="Y 3 is larger than Y 4"; output; end; : (add as many such statements as desired. . . ) : If message NE repeat(" ", 39); keep ID message; Proc Print Data=Checkum; Data Management Plans

QA/QC After Data Entry Look for outliers Outliers: extreme values for a variable given

QA/QC After Data Entry Look for outliers Outliers: extreme values for a variable given the statistical model being used Goal is not to eliminate outliers but to identify potential data contamination, and verify true values 60 50 40 30 20 10 Data Management Plans 0 0 10 20 30 40

QA/QC After Data Entry Methods to look for outliers ◦ Graphical Normal probability plots

QA/QC After Data Entry Methods to look for outliers ◦ Graphical Normal probability plots Regression Scatter plots Maps ◦ Statistical Be sure to transform data when looking for outliers graphically on a graph Data Management Plans

Record QA/QC Activities Performed on Shared Datasets Document your QA/QC activities to certify data

Record QA/QC Activities Performed on Shared Datasets Document your QA/QC activities to certify data for use ◦ Don’t waste anyones time by forcing them to re-check data that were already QA/QC’d Record all changes made to the data ◦ All changes from the original record need to be documented to be defensible Data Management Plans

Summary • Data contamination results from a process or phenomenon that adversely affects data

Summary • Data contamination results from a process or phenomenon that adversely affects data integrity or allows erroneous values to enter a dataset • Quality Assurance and quality control are strategies to: -prevent errors from entering a data set -ensure quality of data -monitor and maintain the quality of data • It is important to define and enforce quality assurance and quality control standards before, during, and after the collection and entry of data Data Management Plans

References Edwards, D, 2000. Data Quality Assurance. In Ecological Data: Design, Management and Processing.

References Edwards, D, 2000. Data Quality Assurance. In Ecological Data: Design, Management and Processing. WK Michener and JW Brunt, Eds. Blackwell Science. p. 70 -91. www. ecoinformatics. org/pubs Cook, RB, RJ Olson, P Kanciruk, and LA Hook, 2001. Best practices for preparing ecological data sets to share and archive. Bulletin of the Ecological Society of America 82(2): 138 -141. Chapman, AD, 2005. Principles of Data Quality. Report for the Global Biodiversity Information Facility, 2004, Copenhagen. http: //www. gbif. org/communications/resources/print-and-onlineresources/download-publications/bookelets/ Grubbs’ Test for outliers. Wikipedia entry, accessed November 18 2010. http: //en. wikipedia. org/wiki/Grubbs%27_test_for_outliers Vanderbilt, K. Quality Assurance & Quality Control. Presentation. Data Management Plans

Before you go. . . We want to hear from you! CLICK the arrow

Before you go. . . We want to hear from you! CLICK the arrow to take our short survey. Data Management Plans