DATA UPLOAD TOOL Bob Fratantonio RPS IOOS Annual
DATA UPLOAD TOOL Bob Fratantonio | RPS IOOS Annual DMAC Meeting May 21 -23, 2018 1
OUTLINE • Background • Bridging metadata standards • Challenges • User interface features • Enhancements 2
THE PROBLEM • Data is a valuable resource BUT inaccessible data has NO value • Data providers want to share, visualize and compare data • Easier said than done, even with an ever growing toolbox of open-source software • • • Not familiar with common data formats like net. CDF • Not familiar with metadata standards • Not familiar with controlled vocabularies • And have no desire to get familiar! Routinely performing the same tasks • Convert data • Apply metadata • Serve data • Provide data services (aggregate, subset, viz, compare, archive) Can we automate? 3
DATA UPLOAD TOOL • Goals • Improve the accessibility of data through metadata standards • Make collaboration easier • open collaboration across institutions, sectors and domains • Generate compliant data/metadata in a common format (net. CDF) • Use lessons learned from DMAC best practices 4
“rapidly expanding body of research data represents source of knowledge to address the myriad challenges facing humanity” BACKGROUND OECD Principles for Access to Research Data from Public Funding - 2007 technology – principles for access openness efficiency flexibility accountability interoperability sustainability quality OECD Principles and Guidelines for Access to Research Data from Public Funding
USE CASE #1 • US Army Corps of Engineers • Field Research Facility Data Integration Framework (FDIF) • • 35+years of Met. Ocean data • • currents, winds, waves, weather, water quality data Survey data • • Duck, NC DEM, Beach profiles, Lidar Different file formats • MATLAB • ASCII (CSV) CSV MAT 6
FRF DATA WORKFLOW • Create metadata templates (. YML) for each sensor • ACDD, CF • Collect and store data • MATLAB based Data Acquisition System • Convert incoming data to Net. CDF • • • Generic MATLAB function to complement current process Yearly folders, monthly files Dynamically apply metadata templates per station • Push data to CHL server • Incoming folder handles basic error checking and routing • Data moves to THREDDS Data Server • • Accessible Aggregated in time (Nc. ML) • Metadata Catalog • nc. ISO -> Geoportal 7
USACE UPLOAD TOOL • Extend the system across other branches of the USACE Coastal Hydraulics Laboratory • Requirements: • Accept generic ASCII file formats • Time series and profile data • “Turbo Tax” style metadata editor • Export metadata templates • Export conversion scripts in python and MATLAB • Get data onto CHL Thredds for inclusion in Oceans. Map • CAC access only 8
USE CASE #2 • Coastal Ocean Model Testbed (COMT) • The COMT serves as a conduit between the federal operational and research communities and allows sharing of numerical models, observations and software tools. • Project based datasets • • • Chesapeake Bay Hypoxia • Gulf of Mexico Hypoxia • The US West Coast Model Intercomparison Project • Puerto Rico/US Virgin Islands Storm Surge & Waves • Coastal Waves, Surge, and Inundation in the Gulf of Maine • Coastal Waves, Surge, and Inundation in the Gulf of Mexico Large datasets Forcing files/input files Unstructured grids Different types of aggregations • Unions (‘horizontally’ by variable) • Join Existing (‘vertically’ by time) • NERACOOS • 1 model dataset per state Simulated surface salinity and bottom water oxygen concentration in 2015 in the small ROMS domain 9
USE CASE #3 • Glider. DAC • • NGDAC Net. CDF File Submission Process • Data Provider Registration • New Deployment Registration • Submission of Net. CDF Files • Dataset Status • Dataset Archiving NGDAC Net. CDF file format • Trajectory. Profile Feature Type • Mix of ACDD, CF, NCEI, IMOS, IOOS standards/suggestions • No validation or restriction on incoming data • Can be a difficult learning curve • Upload tool • Raw data conversion • Metadata form validation • Dropdowns to make selections to prevent spelling mistakes • Python scripts to convert the data can be downloaded 10
METADATA STANDARDS • ISO 19115 For dataset and collection-level metadata • CF: Climate and Forecast conventions for Net. CDF • ACDD: Attribute Conventions for Data Discovery • NCEI Net. CDF Templates: Adding to CF and ACDD, these best practices capture NCEI's experience in providing long-term preservation, scientific quality control, product development, and multiple data re-use beyond its original intent • IOOS Metadata Profile for Net. CDF IOOS Program Office has developed metadata for distribution of the IOOS data using THREDDS or ERDDAP servers • Project Open Data: Promote the continual improvement of the Open Data Policy • Schema. org: To create, maintain, and promote schemas for structured data on the Internet, on web pages, in email messages, and beyond. • EZID: create and manage long-term, globally unique identifiers for data and sources 11
METADATA CROSSWALKS • Metadata standards are ever evolving • ACDD 1. 4? • Who can keep track of these things? ! • Developing Metadata Standards crosswalks allow us to focus on what we know (ACDD), but can still get more • https: //project-open-data. cio. gov/v 1. 1/metadata-resources/#field-mappings • JSON Schema to define metadata • Parse and populate the metadata schemas on upload • Metadata crosswalks update as users add metadata 12
CHALLENGES • Make it easy to use • Prepopulate as many fields as possible • Required vs. suggested metadata • Need to strike a good balance • File formats • • Try to be as generic as possible Always growing • Standard names • Always a challenge to provide guidance on the correct standard name • Aggregations • Large Files • • Provide user secure FTP credentials to upload files to single dataset Anonymous FTP access to view all 13
USER INTERFACE FEATURES • “Turbo Tax” style ease of use • Makes metadata painless! 14
USER INTERFACE FEATURES • Load metadata from previously loaded datasets 15
USER INTERFACE FEATURES • Validation of CF (Climate and Forecast) metadata for variables • Unit validation via UDUNITS • Standard name validation via the CF Standard Name Table 16
USER INTERFACE FEATURES • Autocomplete (standard name suggestions as you type) 17
ASCII UPLOADS - DESCRIBE VARIABLES • Time • CSV upload accepts different date time formats • Select time zone • Variables • Variable name • Units • Standard Name • Short Name 18
EXPORT PRODUCTS • Metadata templates • Discovery metadata + variable attributes • Download conversion scripts • Convert data to Net. CDF • Zipped Python and MATLAB packages containing metadata templates, code, and documentation • Compliance Checker report • Details on how to improve your compliance with CF and ACDD standards 19
VALUE ADDED collaborative integration framework connects simplifies reduces effort research & development and integration integrates increase availability attracts users
ENHANCEMENTS • Visualization • • Nc. WMS 2/Sci. WMS integration • Map Auto-Aggregations Intelligently “guess” the models/variables/grids/standard names Advanced dataset searching Apply QARTOD REST endpoints to download metadata schema files Incentive based • Show user what they get if they provide more metadata • Better guidance for attribution 21
QUESTIONS? Special thanks to the developers: Brian Mc. Kenna Mike Christensen Dalton Kell Ryan Maciel Jason Klas Jake Polatty Jamie Schicho 22
EXTRAS 23
EXTRAS https: //www. oecd. org/sti/sci-tech/38500813. pdf https: //github. com/ioosngdac/wiki/NGDAC-Net. CDF-File-Submission. Process https: //project-open-data. cio. gov/v 1. 1/metadata-resources/#field-mappings 24
DOWNLOAD CONVERSION SCRIPTS README 25
- Slides: 25