Data management and QAQC Quality Assurance Realtime data

  • Slides: 29
Download presentation
Data management and QA/QC Quality Assurance Real-time data: Data ingest Data storage Metadata Data

Data management and QA/QC Quality Assurance Real-time data: Data ingest Data storage Metadata Data display/access External repositories QA/QC: Astoria field team AUV team SATURN data analysis Forecasts 1

Quality Assurance/ Quality Control / Quality Assessment • http: //water. epa. gov/type/rsl/monitoring/132. cfm •

Quality Assurance/ Quality Control / Quality Assessment • http: //water. epa. gov/type/rsl/monitoring/132. cfm • http: //www. ct. gov/dep/lib/dep/site_clean_up /guidance/qaqc/final_dqa_due. pdf Quality Assurance (QA) involves planning, implementation, assessment, reporting, and quality improvement to establish the reliability of laboratory data. Quality Control (QC) procedures are the specific tools that are used to achieve this reliability. QC procedures measure the performance of an analytical method in relation to the QC criteria specified in the analytical method. QC information documents the quality of the analytical data. 2

Seven Data Management Laws 1. Every real-time observation distributed to the ocean community must

Seven Data Management Laws 1. Every real-time observation distributed to the ocean community must be accompanied by a quality descriptor. [raw, preliminary, verified] 2. All observations should be subject to some level of automated real-time quality test. [checksum, valid values] 3. Quality flags and quality test descriptions must be sufficiently described in the accompanying metadata. [not yet accomplished] 4. Observers should independently verify or calibrate a sensor before deployment. [done, also post-deployment checks, mid-deployment checks in development] 5. Observers should describe their method/calibration accuracy in the real-time metadata. [not yet accomplished] 6. Observers should quantify the level of calibration accuracy and the associated expected error bounds. [? ] 7. Manual checks on the automated procedures, the real-time data collected and the status of the observing system must be provided by the observer on a time scale appropriate to ensure the integrity of the observing system. [real-time flagging system, monthly manual QA for physical variables, 6 monthly water sample testing] (QARTOD V: http: //nautilus. baruch. sc. edu/twiki/pub/Main/Web. Home/QARTODVReport_Final 2. pdf) 3

CMOP 1 vs. CMOP 2 • Shake out period for instrument deployments • Development

CMOP 1 vs. CMOP 2 • Shake out period for instrument deployments • Development phase for interfaces and data management • Basis from which to develop stronger QA procedures • Value of relative data 4

Quality control levels and flags • 3 levels of quality control – Complete (no

Quality control levels and flags • 3 levels of quality control – Complete (no quality checks, all data from serial feeds written to ascii files) – Raw (checksum and line format checks applied prior to this stage) – Preliminary (various manual and automated checks and corrections applied) – Verified (all checks and corrections applied) • 3 flags: good, suspect, bad • Metadata describing what tests and corrections have been applied (not yet available in interface) 5

Automated tests and processing • Valid value range • Phycoerythrin correction for turbidity •

Automated tests and processing • Valid value range • Phycoerythrin correction for turbidity • Not currently using: – rate of change – outlier tests (need to be developed for each variable) • Not currently using: – In-situ clear water offset correction (planned for near future) 6

Real-time flagging of data streams • If an instrument is manually identified as producing

Real-time flagging of data streams • If an instrument is manually identified as producing bad data, the data stream can be flagged as bad • Data continues to be collected in the ‘raw’ data archive • Data is excluded from the ‘preliminary’ and ‘best’ data sets. 7

Communities of users • QA: Sarah, Michael, me, Joe and Tawnya • Coordinator’s meeting

Communities of users • QA: Sarah, Michael, me, Joe and Tawnya • Coordinator’s meeting • All-hands meeting 8

Key points within data transfer system Base station RV 0 raw data file SP

Key points within data transfer system Base station RV 0 raw data file SP Serial port reader Config file 9 amb 6400 www. stccmop. org Rsync Web-browsable Archive RV 1 data file Local database Monitoring scripts Monitoring pages Db 2 db transfer Static plots Metadata metadata Failure point Monitoring cdb 02 database Web caching scripts Web plotting tools Recent values Data explorer Station pages Netcdf archive

Real-time data collection with error checking • Quick description of the data acquisition system

Real-time data collection with error checking • Quick description of the data acquisition system • Capability to flag data streams as bad 10

Offering schema unites diverse data sets and sources 11

Offering schema unites diverse data sets and sources 11

Multi-level system for data display/exploration • • Watches for focused users Station pages for

Multi-level system for data display/exploration • • Watches for focused users Station pages for browsing Data explorer for analysis and exploration Commenting on data, sharing data images 12

Data explorer • • Online plot generation Saves sessions Get data Extensive planned enhancements

Data explorer • • Online plot generation Saves sessions Get data Extensive planned enhancements for CMOP 2 – Data analysis – Integration with external data sources (e. g NANOOS, Open. DAP) – Biological data 13

Oxygen watch • Serves a specific user need • Tailored images (cut-off lines, oxygen

Oxygen watch • Serves a specific user need • Tailored images (cut-off lines, oxygen oxic/suboxic/anoxic glider plots) • Blog • Model for further watches: – Salmon plume criteria – M. rubra 14

Redundant data storage • Ascii, ascii, DB, netcdf • Metadata • Transfer to external

Redundant data storage • Ascii, ascii, DB, netcdf • Metadata • Transfer to external repositories – NDBC -> WOC 15

External data repositories • NANOOS (Pacific Northwest • NDBC (National Data Buoy Center) •

External data repositories • NANOOS (Pacific Northwest • NDBC (National Data Buoy Center) • BCO-DMO (Biological and Chemical Oceanography Data Management Office) 16

Data QA Procedures • Year of intensive development • QA Manual • Procedures actively

Data QA Procedures • Year of intensive development • QA Manual • Procedures actively in implementation stage 17

Monthly QA checks • for 03, 04, 05, 06 – Secondary standards for calibration

Monthly QA checks • for 03, 04, 05, 06 – Secondary standards for calibration checks – Water sample collection – Cleaning • Data QA for 01 and 02 – Shorter deployments, difficult to access – Pre and post deployment checks – Eventually, intermittent water sample collection near-by from Forerunner or R/V CORIE 18

Preliminary data QA • • • Intermediate levels of data checking Commenting system (intern)

Preliminary data QA • • • Intermediate levels of data checking Commenting system (intern) Visual and algorithmic identification of outliers Visual identification of invalid behavior Comparisons across instrument changes and instrument cleaning • Comparison with water sample results 19

Data calibration • Field water samples • Laboratory calibrations 20

Data calibration • Field water samples • Laboratory calibrations 20

Data recovery of CMOP 1 data • (maybe just as an appendix for answering

Data recovery of CMOP 1 data • (maybe just as an appendix for answering questions) 21

AUV data management • AUV data is primarily managed by Craig Mac. Neil’s group,

AUV data management • AUV data is primarily managed by Craig Mac. Neil’s group, including extensive QA • (possible slide from Craig to go here) 22

Download data handling 23

Download data handling 23

Radars Plume HF Radar • Data management handled by Mike Kosro’s group • Data

Radars Plume HF Radar • Data management handled by Mike Kosro’s group • Data processing to accurately characterize high velocities of near field plume jet actively in development. 24

Radars River Rad • Real-time data transfer using SWAP • Full data stored at

Radars River Rad • Real-time data transfer using SWAP • Full data stored at UW • Processed data stored at OHSU 25

Glider data • Real-time data transfer via iridium satellite phone • Post-deployment data download

Glider data • Real-time data transfer via iridium satellite phone • Post-deployment data download • QA methods not yet developed • Sensors given pre- and post-deployment calibration check • Data displayed in NANOOS NVS 26

Bottom nodes • Still in research and development phase • CTD data from bottome

Bottom nodes • Still in research and development phase • CTD data from bottome nodes transferred manually after deployment • Will eventually be real-time data transfer • Will be subject to standard CT QA/QC 27

Cruise data 28

Cruise data 28

Fixed stations and buoys 29

Fixed stations and buoys 29