Ocean Observatories Initiative Data Quality Ed Chapman OOI
Ocean Observatories Initiative Data Quality Ed Chapman OOI Chief Systems Engineer NSF Data Management Review April 28, 2014
Goal Address Areas for Recommendations: #2 “Data Policy, Data Quality Protocols and Procedures” and #4 “Data sampling rate strategy development and management ” Specific topics: “Shoreside & at-sea instrument and subsystem quality/calibration procedures/protocols, automated thresholds/flags, manual data QC, exception management, and long term time-series data sampling rate management. ” NSF Data Management Review April 28, 2014
Shoreside & at-sea instrument and subsystem quality/calibration procedures/protocols NSF Data Management Review April 28, 2014
Pre-Deployment Procedures 1. Incoming Inspection • Completed for all Instruments and Platforms • Verifies configuration and state as delivered 2. Calibration Records • Records for each instrument or platform are archived in Vault 3. Quality Conformance Tests (QCT) • Completed for all Instruments and Platforms • Confirms basic functionality (“bench test”), detects failures or damage 4. Requirements Verification • Completed for each instrument type or Class • Validate first article against requirements and specifications 5. Platform Integration and Test • Platform operation verified using platform controller • End-to-End communication verified, instrument to shore station 3 NSF Data Management Review April 28, 2014
The data pipeline starts with the instruments Pre-deployment procedures (examples from Pioneer 1) Instrument acceptance • • Visual inspection and inventory Bench test for basic function Verify against requirements/specs Archive calibration information Platform build • Construct according TDP • Correct problems if necessary, document changes 4 NSF Data Management Review April 28, 2014
The data pipeline starts with the instruments Pre-deployment procedures Platform Burn-in • • Operate in benign environment (e. g. LOSOS high bay) Operate in representative environment (e. g. WHOI dock) Instrument Burn-in • • Verify plausible values (e. g. winds about 10 m/s from the East) Compare like instruments (e. g. two BP instruments on tower) 5 NSF Data Management Review April 28, 2014
At-Sea Procedures: Pioneer-1 Platform monitoring • Full platform function available when within Wi. Fi range • Communication with shore station when out of range Shipboard underway sampling • Meteorology time series from Knorr bow mast • Thermosalinograph time series from Knorr system • Bathymetry from echoshounder and multi-beam Shipboard CTD profiles • Post-deployment casts at each of 3 sites for Pioneer-1 • Seabird 9 -11 with DO, Fluoro, beam x-miss, turb, PAR Physical Samples • Post-deployment casts at each of 3 sites for Pioneer-1 • Salinity and Oxygen completed onboard • Nitrate/Nitrate, Chlorophyll and Carbon system done in shore labs * 1102 -00300 Protocols and Procedures for OOI Data Products: QA, QC, Calibration and Physical Samples 6 NSF Data Management Review April 28, 2014
At-sea protocols Deployment and post-deployment procedures Deployment documentation • Pre-deployment checklists • Mooring deployment logs Post-deployment data assessment • Adjacent CTD cast(s) (temp, sal, oxy, chl, turb) • Shipboard systems (met, surface t-sal, ADCP) • Water samples and lab analysis (sal, oxy, chl, etc) 7 NSF Data Management Review April 28, 2014
At-sea procedures Post-deployment procedures Deployment documentation • Pre-deployment checklists • Mooring deployment logs Post-deployment assessments • • • Adjacent CTD cast(s) (temp, sal, oxy, chl, turb) Shipboard systems (met, surface t-sal, ADCP) Water samples and lab analysis (sal, oxy, chl, etc) Quick-look report Lessons learned 8 NSF Data Management Review April 28, 2014
Automated QC Thresholds and Flags L 1 b and L 2 b Instrument Driver and Agent Permanent storage Calibration Table Data Product Algorithm Secondary Post-Deployment calibration values POLYVAL Algorithm User Secondary Post-Recovery calibration values POLYVAL Algorithm QC Flags Interpolation Lookup Tables QC algorithms (range, spike, stuck, gradient, trend, combined) NSF Data Management Review April 28, 2014
Automated QC Checks • Seven QC Checks – Global Range Test – Local Range Test – Spike Test – Stuck Value Test – Trend Test – Gradient Test – Combined QC Flags NSF Data Management Review April 28, 2014
When? • QC Checks are run on a periodic basis – when data is ingested from the uncabled instruments – Continuously from the cabled instruments • QC Flags are stored. NSF Data Management Review April 28, 2014
Automated QC actions – PSs create Look Up Tables and values are uploaded through the UI as csv files NSF Data Management Review April 28, 2014
Automated QC Updates • If new values are uploaded for any of the QC Flags those values overwrite the original values. • OOINet reruns the QC check for all data products and creates and stores new QC Flags • QC is “value added” so we don’t retain prior flags NSF Data Management Review April 28, 2014
Instrument Driver and Agent Permanent storage Calibration Table Data Product Algorithm Secondary Post-Deployment calibration values POLYVAL Algorithm Human in the loop QC L 1 c and L 2 c User Secondary Post-Recovery calibration values POLYVAL Algorithm Interpolation Lookup Tables QC algorithms (range, spike, stuck, gradient, trend, combined) Human in the loop L 1 c
Human in the Loop QC Actions • PS periodically downloads an L 1 or L 2 product • PS performs HITL QC locally on PS machine • PS uploads L 1 c or L 2 c values, and HITL metadata (provenance, etc. ) into OOINet • User who downloads L 1 or L 2 product to which HITL QC has been applied will see L 1 c or L 2 c variables in the downloaded time series – Only for the time range for which the HITL QC was applied NSF Data Management Review April 28, 2014
Human in the Loop QC Updates • If new HITL values are uploaded for a time period that has already been uploaded those values overwrite the original values. NSF Data Management Review April 28, 2014
Relationship of QC level a, b, and c NSF Data Management Review April 28, 2014
Database L 0 L 1 Data Product Algorithm L 2 Data Product Algorithm Primary Calibration Function L 2 b L 1 a Secondary Calibration Functions L 1 b QC Algorithms Human In The Loop L 1 a L 1 b and QC flags L 1 c Human In The Loop L 0 GUI User QC flags L 2 c L 2 b
exception management NSF Data Management Review April 28, 2014
Long term time-series data sampling rate management NSF Data Management Review April 28, 2014
Questions? Specific topics: “Shoreside & at-sea instrument and subsystem quality/calibration procedures/protocols, automated thresholds/flags, manual data QC, exception management, and long term time-series data sampling rate management. ” NSF Data Management Review April 28, 2014
- Slides: 22