- Slides: 8
Applying IOOS Data Management Techniques to OOI Data IOOS DMAC 2020 Jessica Austin, Axiom Data Science
OOI (Ocean Observatories Initiative) ● ● ● ● 7 main sites (arrays) across the world 36 instrument types, 100+ primary data products Platform types: moored, seafloor, profilers, gliders Data transmission methods: telemetered (minutes), cabled (sub-second), and recovered ~5 years of historic data, ~10 TB net. CDF files In a word… complex Challenge: how to help users understand this complex dataset, and extract what they need for their work
Understanding and applying OOI data ● Goal of new Data Explorer tool: ○ ● Approach: ○ ○ ○ ● Connect researchers with OOI data Discovery: As a Researcher, I want to find the data that I'm interested in, without having to invest large amounts of time learning the OOI jargon and systems. Data Exploration: As a Researcher, I want to efficiently visualize the data I've found, and determine whether or not it is useful for my work Data Access: As a Researcher, once I determine the data is useful, I want to quickly extract it from the OOI systems, so that I can run my own analysis. How community standards can help: ○ ○ Data discovery: use metadata standards ■ IOOS metadata profile, CF standard names Data exploration: use QC standards to flag data ■ QARTOD and ioos_qc Data Access: use standard file formats and community-adopted services ■ net. CDF, ERDDAP, THREDDS Interoperability win: IOOS Glider. DAC
Data Discovery As a Researcher, I want to find the data that I'm interested in, without having to invest large amounts of time learning the OOI jargon and systems. ● ● ● Challenges: metadata, param names varied across sites Approach: map to metadata standards CF Standard names ○ ○ ○ ● OOI already has standard names for many data products Worked to match up more parameters with existing names Some names still need to be proposed Apply IOOS metadata profile to datasets in ERDDAP
Data Exploration As a Researcher, I want to efficiently visualize the data I've found, and determine whether or not it is useful for my work ● ● ● OOI is using working to apply QARTOD to their data, using the ioos_qc library QC rollup flags are visualized and available in the net. CDF files in an IOOS-standard way No existing standards for annotations/manual QC?
Data Access: Challenges As a Researcher, once I determine the data is useful, I want to quickly extract it from the OOI systems, so that I can run my own analysis ● Conducted dozens of user interviews, and every person wanted something different here ○ ○ ○ ● Wide range of technical ability: "I want to click a button and download a CSV" to "I'll write my own python scripts to periodically pull the data I need" Dataset depends on use case: "I want all the CTD data for this particular instrument in one file", "I want all the Salinity data off the Oregon shelf during fall 2018", "I want to find the raw ADCP data", etc Some people love ERDDAP, others hate it, many hadn't heard of it Other challenges ○ ○ Very large, high-res datasets net. CDF files exported from existing OOI systems is split up by deployment and data transmission method
Data Access: Approach ● ● ● Data is stored in net. CDF files and served via THREDDS and ERDDAP Serve two main data products: Original, full-resolution net. CDF files ("Gold Copy") ○ ○ ● Access via ERDDAP (one dataset per deployment) Access via THREDDS (Open. DAP or direct file download) Simplified single timeseries per instrument ○ ○ ○ e. g. , merge all deployments, and merge telemetered and delayed-mode If high-res, binned on time (1 minute) and depth (1 meter) to reduce file size Great for new users, or those who quickly want a "slice" of the data Follows IOOS metadata profile 1. 2 Access via ERDDAP or direct CSV download (powered by ERDDAP)