DATA QUALITY VALIDATION Catherine BauerMartinez Indiana University Alvaro

DATA QUALITY & VALIDATION Catherine Bauer-Martinez, Indiana University Alvaro Andres Alvarez, Stanford Heather Eng, University of Pittsburgh New York City Tuesday, August 15, 2017

1. Before Data Collection Starts 2. During Data Collection (internal) 3. During Data Collection (external)

1. Before Data Collection Starts
![WHAT IS METADATA? Data [information] that provides information about other data WHAT IS METADATA? Data [information] that provides information about other data](http://slidetodoc.com/presentation_image_h2/8b04dbc59466122fd6e3bfd0433192db/image-4.jpg)
WHAT IS METADATA? Data [information] that provides information about other data

GOAL: Design data collection forms to meet study needs and ensure complete/correct quality data (Data Dictionary and Project Setup). Good data quality starts with a good database design… BEN TS! EFI o Reduce number of issues during data capture phase. o Reduce the REDCap administrator future support burden. o Reduce time on data cleaning process. o Data sharing. What are your recommendations and good practices before moving a project to production mode?

INCONSISTENCIES IN CODING FOR YES/NO QUESTIONS.

FORMS NOT ASSIGNED TO AN EVENT.

FORMS NOT ASSIGNED TO AN EVENT.

LOGIC FIELDS… Calculated Fields. Branching Logic. Automated Invitation Logic –ASI Survey Queue

LOGIC FIELDS… Calculated Fields. Branching Logic. Automated Invitation Logic –ASI Survey Queue

THE PROJECT IS SUFFICIENTLY TESTED. We recommend the creation of at least three test records and at least one export in development mode. This allows you to preview the type of results expected from the project. It is also highly recommended reviewing project's design with a statistician prior to entering production mode to ensure your data capture is configured properly.

MOST COMMON ISSUES WE FOUND AT STANFORD

QUALITY CONTROL BEFORE GOING TO PRODUCTION (STANFORD) Inconsistencies in coding for If positive/negative questions. Date format inconsistencies. “ 99” or “ 98” recommended coding of “other”, “unknown” or similar values in dropdown lists, radio-buttons or check-boxes. "My First Instrument" form name presence. research, PI name and last name. If research, IRB Information. % of validated fields. Forms with more fields than recommended. Calculations using "Today". No fields tagged as identifiers. Agree? Which other recommendations would you add to the list?

WHY NOT AUTOMATE THIS? … We created a tool for this- Demo Time!

2. During Data Collection (internal)

DON’T UNDEREST IMATE HOW IMPORTAN T IT IS TO

IT PAYS TO BE PATIENT….

DATA VALIDATION IN REDCAP

DATA VALIDATION IN REDCAP

DATA QUALITY TOOL REDCap has 8 pre-defined data quality rules that you can execute following data entry. Missing values (excluding missing values due to branching logic) Missing values for required fields only Incorrect data type Out-of-range values Outliers for numerical fields Hidden fields that contain values Multiple choice fields with invalid values Incorrect values for calculated fields You can create customized rules as well.

DATA EXPORTS, REPORTS AND STATS

DATA EXPORTS, REPORTS AND STATS • Create reports to view all your data in a spreadsheet without having to export from the system. • Serves as the search engine of the REDCap project • Use reports to check your data quality • Queries database in real time and displays results in table format. • Choose selected variables • Use filters to create reports • Reports are saved in left navigation panel • Updates every time you click on defined report • Edit reports as needed


BEST PRACTICES • Avoid “free” text fields • Define data type for each variable • Use standard measures and codes • Do not mix data types (e. g. , “ 428. 0 heart failure patient had pneumonia”) put code and comment in separate fields • Use REDCap validation rules (set minimum and maximum values) • Reduce the amount of missing data (!) • Avoid blanks • Be consistent throughout the study by using the same codes • Set up your database with the end in mind

3. During Data Collection (external)

USING ANALYSIS SOFTWARE FOR COMPLEX DATA QUALITY PROGRAMS Automated overnight process -> SAS Research Repository • c. URL+API export: form-specific. CSV files from REDCap • “DBLOAD. sas” import: form-specific SAS datasets • Additional external data (lab, specimen tracking, EMR) • Other related REDCap projects • Relate by keys (ID, date, timepoint, …) • “EDITS. sas” quality control programs • “REPORTS. sas” administrative reports

USING ANALYSIS SOFTWARE FOR COMPLEX DATA QUALITY PROGRAMS “EDITS. sas” quality control programs • • Confirm REDCap point-of-entry validations Complex longitudinal checks Logical checks between multiple REDCap projects Consistency checks with non-REDCap data, e. g. • laboratory specimen tracking • self-reported medications vs EHR Reports emailed to coordinators for correction in REDCap

USING ANALYSIS SOFTWARE FOR COMPLEX DATA QUALITY PROGRAMS “REPORTS. sas” • • High-level administrative reports Accrual and retention Forms and Visit completeness Summary of outstanding QC issues Reports emailed to PIs and posted on study website

USING ANALYSIS SOFTWARE FOR COMPLEX DATA QUALITY PROGRAMS “Log. Scanner. sas” • • Opens log file before c. URL+API export Closes log file after Reports emailed and posted Scans log file for errors, warnings, unexpected events Sends email to DM each morning: • Errors found … <details> • All is well!

THANK YOU! Breakout Session New York City Tuesday, August 15, 2017
- Slides: 30