INTRODUCTION TO THE DATA QUALITY OBJECTIVES PROCESS Course

INTRODUCTION TO THE DATA QUALITY OBJECTIVES PROCESS

Course Objectives At the conclusion of this course, participants will understand: • The Agency's Quality System and the elements of the DQO Process • How the DQO programs process applies to EPA • How to interpret the consequences of potential decision errors.

Systematic Planning • Agency policy requires the use of a systematic planning process to develop performance criteria • DQO Process defines performance and acceptance criteria for decision making • EPA recommends the DQO Process

What is the DQO Process? The DQO Process is a systematic planning process for generating environmental data that will be sufficient for their intended use.

What are DQOs? DQOs are quantitative and qualitative criteria that: • Clarify study objectives • Define appropriate types of data to collect • Specify the tolerable levels of potential decision errors

DQO Process • • • Planning Tool for Managing Decision Errors Improves: Planning Effectiveness Design Efficiency Defensibility of results/decisions Generates appropriate data Type Quality Quantity

DQO Process Designed to answer: • What do you need? • Why do you need it? • How will you use it? • What is your tolerance for errors?

DQO Process: Underlying Principles 1. All collected data have error. 2. Nobody can afford absolute certainty. 3. The DQO Process defines tolerable error rates. 4. Absent DQOs, decisions are uninformed. 5. Uninformed decisions tend to be conservative and expensive.

DQOs Strike a Balance

DQOs in the Context of the Project Life Cycle

The DQO Process 1. State the Problem. 2. Identify the Decision. 3. Identify the Inputs to the Decision. 4. Define the Boundaries of the Study. 5. Develop a Decision Rule. 6. Specify Tolerable Limits on Decision Errors. 7. Optimize the Design.

Repeated Application of the DQO Process

Data Quality Objectives: Outputs from Each Step of the Process

The DQO Process Promotes Communication

A Quality Planning Model

The DQO Process Encourages Efficient Planning • Clearly stated objectives • A framework for organizing complex issues • Limits on decision errors specified • Efficient resource expenditure

DATA QUALITY OBJECTIVES

Seven Steps of the DQO Process 1. State the problem to be resolved. 2. Identify the decision to be made. 3. Identify the inputs to the decision. 4. Define the boundaries of the study. 5. Develop a decision rule. 6. Specify the tolerable limits on decision errors. 7. Optimize the design for obtaining the data.

Stating the Problem Who should participate on the planning team? • Risk Assessor • Scientist/Engineer • Statistician/Data Analyst • • • Data User/Decision Maker Lab and Field Personnel QA Specialist What is the problem? What resources are available? What time is available? What important social/political issues have an impact on the decision?

Wood Preserving Site: Background • U. S. State - led investigation of possible soil contamination problem • Creosoting of timbers • Soil contaminated with creosote • Contains Polyaromatic Hydrocarbons (PAHs) • Early Sampling Results: – Soil PAH concentration in low activity area 0 -80 mg/kg – Soil PAH concentration in high activity area 80 -140 mg/kg – Off site: Not detected – Future land use will be residential

Wood Preserving Site: Background The Team: • Decision Maker • Chemist • Field Sampling Technician • QA Specialist • Risk Assessor/Toxicologist • Environmental Scientist with Statistical Training

Wood Preserving Site: Problem Statement The Problem: Obvious creosote contamination in the soil may pose a danger to human health or the environment. Information is necessary to determine the extent of danger. Resources: Measurement Budget = $100, 000 Time Limit: Remediate in 1 year Socio-political: Future land use is residential

Identifying the Decision • Identify the principal study question. Clarify the main issue to be resolved. • Specify the alternative actions that would result from each resolution. Associate a course of action with each possible answer. • Define the decision statement that must be resolved to address the problem. Combine the principal study question and the alternative actions into a specific decision statement.

Wood Preserving Site: Identifying the Decision Study Question: – Does creosote contamination in the soil pose an unacceptable danger to human health or the environment? Alternative Actions: – Remediate the soil – Do not remediate the soil (no action) Decision Statement: – Determine whether the creosote contamination in soil poses a danger that requires remediation.

Identifying Inputs for the Decision • Focus on what information is needed for the decision. • Identify the variables/characteristics to be measured. • Identify the information needed to establish the action level.

Wood Preserving Site: Inputs Needed for Decision Variable of Interest: Action Level: PAHs Some PAHs are carcinogens that are dangerous to human health. Set by a toxicologist using relevant site-specific exposure assessment at 50 ppm.

Defining the Boundaries • Define the spatial boundary for the decision Define the geographical area within which decisions apply Define the media of concern Divide each medium into homogeneous strata • Define the temporal boundary of the decision Determine the time frame to which the study results apply Determine when to study • Define a scale of decision making • Identify practical constraints on data collection

Wood Preserving Site: Spatial Boundaries • Define the geographical area within which decisions apply: The property boundary (No PAHs detected off site) • Specify the characteristics that define the population of interest: PAHs in surface soil to 15 cm depth • Divide each medium into homogeneous strata: The site has been divided into two areas: 1) Area of high activity where the concentration is expected to be high 2) Area of low activity where the concentration is expected to be low

Wood Preserving Site: Temporal Boundaries • Determine the time frame to which the study results apply: The results will represent future conditions at the site. (Future lifetime exposure for residents) • Determine when data should be collected: Sampling begins in 3 months. Remediation completed within 1 year. Sampling results will not vary depending on weather conditions

Wood Preserving Site: Defining the Boundaries Scale of Decision Making: • Decisions will be made for each residential lot-sized area (based on future land use) Practical Constraints: • Existing structures and debris may limit sampling locations

Develop a Decision Rule Develop an "if/then" statement that incorporates: • The population parameter of interest (e. g. , mean, maximum, percentile) • The scale of decision making (e. g. , residential lot size) • The action-triggering value • The alternative actions

Wood Preserving Site: Decision Rule Use average (mean) PAH concentrations to identify lots that pose a health threat. – If the true mean PAH concentration within a residential lot is greater than 50 mg/kg, then the soil will be remediated. – If not, then the soil will be left in situ.

Specify Limits on Decision Errors • Determine the possible range of the parameter of interest • Determine baseline condition (null hypothesis) • Determine consequences of each decision error. Consequences may include: Health risks Ecological risks Political risks Social risks Resource risks

Specifying Limits on Decision Error • Specify the gray region - a range of possible parameter values where the consequences of decision errors are relatively minor (too close to call) – Bounded on one side by the action level – Bounded on the other side by the parameter value where the consequences of making a decision error begins to be significant • Set quantitative limits on false rejection and false acceptance errors by considering the consequences of these potential decision errors.

Statistical Error Types • Rejecting the baseline condition when it is true is a False Rejection error, F(r). Decision: Not hazardous when it actually is hazardous • Accepting the baseline condition when it is false is a False Acceptance error, F(a). Decision: Hazardous when it actually is not hazardous

Decision Errors: Synonyms and Plain English If the baseline assumption is that the program or site is in compliance, then: False Rejection Error F(r), Type I Error, False Positive • Deciding program or site not in compliance when it is • An overreaction to a situation • Wasted resources, unnecessary expenditure False Acceptance Error F(a), Type II Error, False Negative • Deciding program or site is in compliance when it is not • A missed opportunity for correction • Allowing a hazard to public health or the ecosystem

False Rejection and False Acceptance Baseline Condition: True mean level equal or below standard Alternative: True mean level above standard

Decision Errors: Synonyms and Plain English If the baseline assumption is that the program or site is NOT in compliance, then: False Rejection Error F(r), Type I Error, False Positive • Deciding program or site is in compliance when it is not • A missed opportunity for correction • Allowing a hazard to public health or the ecosystem False Acceptance Error F(a), Type II Error, False Negative • Deciding program or site not in compliance when it is • An overreaction to a situation • Wasted resources, unnecessary expenditure

The Probability of Making False Rejection Decision Errors If the true mean is much greater than the action level, few low readings will occur. So, there is a small chance of reaching a wrong conclusion. If the true mean is close to the action level, many low readings will occur. Erroneous conclusions are much more likely.

Specify Limits on Decision Error (Construct a "What If" Table) Assign probability values to points above and below the action level that reflect the tolerable probabilities for decision errors.

Decision Performance Goal Diagram

Optimize the Design • Develop general data collection design alternatives – Simple random sampling with compositing – Stratified random sampling • For each design, develop cost formula, select a proposed method of data analysis, develop method for estimating sample size to correspond to method for data analysis • Select the most resource-effective design – consider cost, human resources, other constraints – consider performance of design

Decision Performance Goal Diagram with Performance Curve

DQO Process Output • Qualitative and Quantitative Framework for a study • Feeds directly into the Quality Assurance Project Plan which is mandatory for EPA environmental data collection activities

DATA QUALITY OBJECTIVES: Cadmium Contaminated Fly Ash Example

Case Study Introduction Case study - Cadmium contaminated fly ash waste • Output from a DQO case study • Shows how the steps of the DQO process aid in developing a sampling design • Illustrates decisions that could be made within the Resource Conservation and Recovery Act (RCRA) Program • Not intended to represent the policies of the RCRA Program

Cadmium Contaminated Fly Ash Waste: Background Information • Municipal incinerator • Fly ash dumped in municipal landfill • Company calls ash "Non-hazardous"

Background Information New waste stream: • Contains cadmium − Toxic effects: inhalation and ingestion exposure − Short term and chronic effects • The new ash will be tested using Toxicity Characteristic Leaching Procedure (TCLP). • Waste will be classified as hazardous if the cadmium concentration in the TCLP > 1 mg/liter.

Background Information • Pilot study - to determine the variability of the cadmium concentration in ash • Results: – Relatively constant variability within containers – Relatively high variability between containers

The DQO Process: State the Problem • Members of Planning Team – Plant Manager – Plant Engineer Manager – Statistician/Data Analyst - Chemist - Quality Assurance • The Problem – To determine which loads of ash should be sent to a RCRA facility and which can be dumped in the municipal landfill • Available resources – The difference in cost between municipal and RCRA disposal is $6750. • Project constraints – Cost (Budget approximately $3, 000 for sampling)

The DQO Process: Identify the Decision • Define the alternative actions. – The waste fly ash could be disposed of in a RCRA landfill. – The waste fly ash could be disposed of in a municipal landfill. • Form alternatives into a decision statement. – Determine if the cadmium concentration in the TCLP leachate exceeds RCRA regulatory standards.

The DQO Process: Identify the Inputs to the Decision • Identify key information. – Concentration of cadmium in fly ash – Fly ash samples subjected to the TCLP test and analyzed for cadmium • Identify information to establish the Action Level. – RCRA standard (1. 0 mg/l using the TCLP method) • Confirm that appropriate analytical methods exist. – Cadmium is a metal that has a detection limit well below the RCRA standard.

The DQO Process: Define the Boundaries of the Study • Identify the spatial boundaries. – Fly ash in containerized bins; at least 70% capacity • Identify temporal boundaries. – The ash does not present an exposure hazard and will not degrade; no sampling time constraints are necessary. • Define the scale of decision making. – A decision will be made about each container. • Identify practical considerations that may interfere with the study. – Physically obtaining samples from the containers

The DQO Process: Develop a Decision Rule • The Parameter of Interest – The average concentration of cadmium • Specify the Action Level for the study. – The RCRA standard for cadmium (1. 0 mg/l) in TCLP leachate • Develop a Decision Rule. – If the average cadmium concentration in a bin is more than 1. 0 mg/l, then the ash will be disposed of in a RCRA facility. – If the average cadmium concentration in a bin is less than 1. 0 mg/l, then the ash will be disposed of in a municipal landfill.

The DQO Process: Specify Limits on Decision Errors • Determine baseline condition – Null hypothesis = "hazardous" (RCRA requirement) mean > 1. 0 mg/l • Identify decision errors – False rejection: Decide mean < 1. 0 mg/l when mean > 1. 0 mg/l – False acceptance: Decide mean > 1. 0 mg/l when mean < 1. 0 mg/l • Identify limits on decision errors & gray region

The DQO Process: Tolerable Limits of Decision Error

Optimize the Design • Develop general data collection design alternatives – Simple random sampling with compositing – Sequential random sampling • For each design, develop cost formula, select a proposed method of data analysis, develop method for estimating sample size to correspond to method for data analysis • Select the most resource-effective design – consider cost, human resources, other constraints – consider performance of design

The DQO Process: Optimize the Design Elements of the Design: • • Hypothesis Test Statistical Model Design Description/Option Sample Location Sample Cost Sample Size Design Performance

Design Options: Simple Random Sampling • Simple Random Sample – Simplest type of probability sampling – Every point in the sampling medium has an equal chance of being selected. • Application – Small variance – Inexpensive sampling and analysis

Design Options: Composite Sampling • Physically combining multiple samples then drawing one or more sub-samples for analysis • Application: – When an average concentration is sought and there is no need to detect peak concentrations – Large variance (allows the researchers to sample a larger number of locations) – Reduces total cost when analytical costs are higher than sample collection costs

Design Options: Sequential Sampling • Conduct several rounds of sampling and analysis; perform statistical test between each round to make one of three decisions: – Accept null hypothesis – Reject null hypothesis – Collect more samples • Application – When sampling and analysis costs are high – When information about sampling or measurement variability is lacking – When the waste is stable over time frame of the sampling effort

Sample/Analysis/Disposal Costs • Sample collection costs from each container- $10/sample • TCLP cost - $150/analysis • 15 tons of ash per container • $500/ton RCRA landfill ($7, 500 per container) • $50/ton municipal landfill ($750 per container)

Decision Performance Goal Diagram with Performance Curve: Simple Random Sampling

Decision Performance Goal Diagram with Performance Curve: Relaxed Decision Error Constraints

Decision Performance Goal Diagram with Performance Curve: Increased Gray Region Width

Decision Performance Goal Diagram with Performance Curve: Simple Random Sampling with Compositing

Compare Overall Efficiency *Simple Random Sampling $5920 Simple Random Sampling with Relaxed Decision Error Constraints $3200 Simple Random Sampling with Increased Gray Region Width $2080 *Simple Random Sampling with Compositing $3040 * Used original Decision Error Limits

Contamination of Tarheel County's Sole Drinking Water Source/System

Drinking Water Problem Week 1 Quarterly monitoring of drinking water did not detect contaminants above drinking water standards. any Week 2 Groundwater is the drinking water source for Tarheel County. Atrazine was discovered in surface waters (that are hydraulically connected to groundwater) at level up to 500 ppb, which is well above the maximum contaminant level (MCL) of 3 ppb. Week 3 Source of contamination has not been identified. Week 4 Citizens are concerned about threat to public health Present and demand that State and Local official ensure that water is safe to drink.

Tarheel County Water Supply System • 6 wells in wellfield • Water company operates water system • System capacity: 8. 6 million gallons/day (MGD) • System demand: 3 -5 MGD • System serves 25, 000 residents • Minimal Treatment (chlorination only) • Centralized above-ground storage holds water from all • wells Capacity is nearly 10 gallons to ensure 4 -hour residence time for chlorination

Tarheel County Water Supply System Assignment: Decide whether the level of atrazine in drinking water exceeds the MCL and requires corrective action.

Data Quality Objectives Decision Error Feasibility Trials Software (DQO/DEFT)

The Purpose of DEFT • DEFT determines the feasibility of DQOs based on sample size and cost for several sampling designs • DQOs are feasible if at least one sampling design can satisfy the DQOs (decision error limits, cost constraints, time limitations, etc. ).

Uses of DEFT • Aids in iterations between steps 6 and 7 of the DQO process • That is, it provides a smooth transition between the specific DQOs and the development of a data collection design • As a learning tool, facilitates understanding and communication

What DEFT Cannot Do DEFT should not be used to decide on a final data collection design or sample size. It cannot account for differences between: • Media • Contaminants • Spatial boundaries • Temporal boundaries

How DEFT Works • Utilizes outputs of the DQO process • Evaluates several basic collection designs • Estimates the number of samples • Estimates costs of data collection designs

What DQO Outputs are Necessary as DEFT Inputs? • Limits on decision errors • Action level • Possible range of parameter (minimum, maximum) • Cost of sample collection and analysis per sample • Location and width of gray region • Estimated standard deviation • Null hypothesis (H 0)

Analysis of DEFT Allows user to: • • • Determine effect or change DQOs View Decision Performance Goal Diagram Change sampling design – Simple Random Sampling – Composite Random Sampling – Stratified Random Sampling Set sample size Save DQOs, design information, and decision performance goal diagram to a file


DEFT in the Project Life Cycle

Beyond the DQO Process

The Project Life Cycle

What Is A QA Project Plan? • Mandatory planning document • Part of mandatory Agency-wide Quality System • Description of how data will be collected, assessed, and analyzed • Project Blueprint - who, what, where, when, why • Living document that is revised to reflect significant changes

QA Project Plans (QAPPs) • QAPPs must be approved prior to the start of data collection • QAPPs are required when environmental data operations occur in: – Intramural projects – Contracts, work assignments, delivery orders – Grants, cooperative agreements – Interagency agreements (when negotiated) – State-EPA agreements – Responses to statutory or regulatory requirements and to consent agreements

What Does A QA Project Plan Do For You? When you are asked: − − "What did you do? " "How did you do it? " "Why did you do it? " "Did you do it correctly? " The QA Project Plan has the answer.

Elements of a QA Project Plan Group A. Project Management Group B. Data Generation and Acquisition Group C. Assessment and Oversight Group D. Data Validation and Usability

Group A: Project Management Element 1. 2. 3. 4. 5. 6. 7. 8. 9. Title and Approval Sheet Table of Contents Distribution List Project/Task Organization Problem Definition/Background Project/Task Description Quality Objectives and Criteria Special Training Requirements/Certification Documentation and Records

Group B: Data Generation & Acquisition Elements 1. 2. 3. 4. 5. 6. Sampling Process Design (Experimental Design) Sampling Methods Requirements Sample Handling and Custody Requirements Analytical Methods Requirements Quality Control Requirements Instrument/Equipment Testing, Inspection, and Maintenance Requirements 7. Instrument Calibration and Frequency 8. Inspection/Acceptance Requirements for. Supplies and Consumables 9. Data Acquisition Requirements (Non-Direct Measurements 10. Data Management

Elements in Group C & Group D Group C: Assessment & Oversight Elements 1. Assessments and Response Actions 2. Reports to Management Group D: Data Validation & Usability Elements 1. Data Review, Validation, and Verification Requirements 2. Validation and Verification Methods 3. Reconciliation with User Requirements

Data Quality Assessment (DQA) • A process to determine if data are adequate for their intended use – scientific and statistical evaluation – determine if data are of the right type, quality, and quantity • Sample data are used to make decisions during DQA • Does data provide "sufficient evidence" to draw conclusions?

Data Quality • Data quality is meaningful only when "data quality" relates to intended use of data • Some data are of adequate quality for some purposes but not for others • Need to determine if the data are of the right type, quality, and quantity for their intended use

Data Quality Assessment Can Answer: – Do the data violate the conceptual site model or test assumptions? – Did I collect enough data? – What is my conclusion? Can Not Answer: – Did I make a decision error? (good decision -- bad outcome) – What are the "true" conditions? – Do I need different types of data?

Data Quality Assessment Can Decision maker's contribution: − Inspection of data for scientific anomalies − Responsibility for transcription errors − Assessment of effect of QA and QC deviations − Professional contextual judgment

DQA is a Joint Effort Statistician's contribution: − Graphical display of data and trends − Statistical analysis required by the DQO − Investigation of assumption violations − Identification of potential outliers − Providing direction for data improvement

The 5 Steps of Data Quality Assessment 1. Review the DQOs and Sampling Design 2. Conduct a Preliminary Data Review 3. Select the Statistical Test 4. Verify the Assumptions of the Statistical Test 5. Draw Conclusions from the Data

Guidance for Data Quality Assessment: Practical Methods for Data Analysis (G-9) • Written for non-statisticians • Supplements Agency guidance • Does not replace statistical texts • Regular supplements – Current examples – Shared information

Data. QUEST • A PC-based software package that performs baseline Data Quality Assessment • Provides simple tools to a wide audience • Implements statistical methods described in guidance (G-9) • Supplements guidance so description of statistical tools is not contained in the User's Guide

Advantages • Menu-based System - no special language or commands like statistical packages • Does not treat data as discreet numbers in graphs like spreadsheets • More standards statistical graphs than spreadsheets

QA Guidance www. epa. gov/quality Guidance for the Data Quality Objectives Process (G-4) − Planning process that ties data collection designs to user defined decision error tolerances Guidance for QA Project Plans (G-5) − Utilizes outputs of DQO Process for detailing data collection operations, the "blue-print" of data collection Guidance for Data Quality Assessment (G-9) − Assessment of data to establish if they meet user-defined decision error limits
- Slides: 99