Sharing and publishing data using CUAHSI HIS Outline
Sharing and publishing data using CUAHSI HIS Outline • HIS data publication system • Water. ML and Water. One. Flow web services • Observations data model (ODM) • Data loading • Data editing and quality control • Controlled vocabularies • HIS central registration and tagging
HIS Data Publication System Query, Visualize, and Edit data using ODM Tools Analysis GIS Matlab Splus R IDL Streaming Data Loader ODM Database Base Station Computer(s) Telemetry Network Excel Sensors Hydroseek Java C++ VB Get. Sites Get. Site. Info Get. Variable. Info Get. Values Water. ML ODM Data Loader Discovery Access Hydro. Excel Hydro. Get Hydro. Link Hydro. Objects Service Registry Hydrotagger Harvester Water. One. Flow Web Service ODM Text Contribute your ODM Water Metadata Catalog HIS Central
Steps in publishing data 1. Establish an HIS Server 2. Load observations into an ODM database 3. Provide access to data through web services (http: //<your-server>/<your-network>/cuahsi_1_0. asmx? WSDL) 4. Index the resulting water data service at HIS Central (http: //hiscentral. cuahsi. org)
Establishing an HIS Server • Windows server platform • Base Software: Microsoft SQL and Arc. GIS Server • HIS Server applications – Water. One. Flow web services – ODM + tools – DASH • HIS Data http: //his. cuahsi. org/hisserver. html
Load Observations into an ODM Database Groundwater levels Streamflow Precipitation & Climate Water Quality ODM Soil moisture data Flux tower data
Outline • • HIS data publication system Water. ML and Water. One. Flow web services Observations data model (ODM) Data loading Data editing and quality control Controlled vocabularies HIS central registration and tagging
Water. ML and Water. One. Flow Locations Variables Time Client LOAD Get. Site. Info Get. Variable. Info Get. Values Water. ML Water. One. Flow Web Service TRANSFORM Data TCEQ Data USGS Data. UT Data Repositories EXTRACT Water. ML is an XML language for communicating water data Water. One. Flow is a set of web services based on Water. ML Slide from David Valentine
Water. One. Flow Web Services Web Application: Data Portal Your application • Excel, Arc. GIS, Matlab • Fortran, C/C++, Visual Basic • Hydrologic model • ……………. Your operating system • Windows, Unix, Linux, Mac Internet Web Services Library Simple Object Access Protocol Slide from David Valentine
Water. One. Flow • Set of query functions • Returns data in Water. ML NWIS Daily Values (discharge), NWIS Ground Water, NWIS Unit Values (real time), NWIS Instantaneous Irregular Data, EPA STORET, NCDC ASOS, DAYMET, MODIS, NAM 12 K, USGS SNOTEL, ODM (multiple sites) Slide from David Valentine
Water. ML design principles • Goal - capture semantics of hydrologic observations discovery and retrieval • Role - exchange schema for CUAHSI web services • Driven by – Hydrologists (community review) – ODM – USGS NWIS, EPA STORET, Academic Sources • Conformance with Open Geospatial Consortium standards. http: //www. opengeospatial. org/ • For XSD pros, the Water. ML schema is at http: //his. cuahsi. org/wofws. html Slide from David Valentine
Point Observations Information Model Utah State University Data Source Little Bear River Get. Sites Network Get. Site. Info Little Bear River at Mendon Rd Sites Get. Variable. Info Dissolved Oxygen Variables 9. 78 mg/L, 1 October 2007, 6 PM Values Get. Values {Value, Time, Qualifier, Offset} • • A data source operates and provides data to an observation network A network is a set of observation sites (stored in a single ODM instance) A site is a point location where one or more variables are measured A variable is a measured property (e. g. describing the flow or quality of water) A value is an observation of a variable at a particular time A qualifier is a symbol that provides additional information about the value An offset allows specification of measurements at various depths in water
Building Blocks of Water. ML Responses • Response Types - Sites Get. Site. Info - Variables Get. Variable. Info • Key Elements – – – site source. Info series. Catalog variable value query. Info - Time. Series Get. Values Slide from David Valentine
Sites response query. Info name code location site series. Catalog Series Time. Period. Type how many variables when Slide from David Valentine
Variables. Response. Type • variable – same as in series element • Code, name, units Sites Variables Values Slide from David Valentine
Get. Values response - time. Series • query. Info • time. Series – source. Info – “where” – variable – “what” – values Sites Variables Values Slide from David Valentine
Values • Each time series value recorded in value element • Timestamp, plus metadata for the value, recorded in element’s attributes qualifier ISO Time value Slide from David Valentine
Outline • • HIS data publication system Water. ML and Water. One. Flow web services Observations data model (ODM) Data loading Data editing and quality control Controlled vocabularies HIS central registration and tagging
Why an Observations Data Model • Syntactic heterogeneity (File types and formats) • Semantic heterogeneity – Language for observation attributes (structural) – Language to encode observation attribute values (contextual) • Publishing and sharing research data • Metadata to facilitate unambiguous interpretation • Enhance analysis capability
Scope • Focus on Hydrologic Observations made at a point • Exclude Remote sensing or grid data. These are part of a digital watershed but not suitable for an atomic database model and individual value queries • Primarily store raw observations and simple derived information to get data into its most usable form. • Limit inclusion of extensively synthesized information and model outputs at this stage.
What are the basic attributes to be associated with each single data value and how can these best be organized? Value Offset Date. Time Variable Offset. Type/ Reference Point Location Units Source/Organization Interval (support) Data Qualifying Comments Accuracy Censoring Method Quality Control Level Sample Medium Value Type Data Type
CUAHSI Observations Data Model Streamflow Groundwater levels • A relational database at the single observation level Precipitation Soil (atomic model) & Climate moisture • Stores observation data made at points Flux tower Water Quality • Metadata for unambiguous data interpretation • Traceable heritage from raw “When” Time, T measurements to usable t A data value information vi (s, t) • Standard format for data s “Where” sharing Space, S • Cross dimension retrieval Vi and analysis “What” Variables, V
CUAHSI Observations Data Model http: //www. cuahsi. org/his/odm. html
Site Attributes Site. Code, e. g. NWIS: 10109000 Site. Name, e. g. Logan River Near Logan, UT Latitude, Longitude Geographic coordinates of site Lat. Long. Datum Spatial reference system of latitude and longitude Elevation_m Elevation of the site Vertical. Datum of the site elevation Local X, Local Y Local coordinates of site Local. Projection Spatial reference system of local coordinates Pos. Accuracy_m Positional Accuracy State, e. g. Utah County, e. g. Cache
Independent of, but can be coupled to Geographic Representation Arc Hydro ODM Feature Observations Data Model Sites Site. ID Site. Code Site. Name Latitude Longitude … 1 1 OR Coupling. Table Site. ID 1 Hydro. ID Waterbody Hydro. Point Hydro. ID Hydro. Code FType Name Junction. ID * Complex. Edge. Feature 1 Hydro. ID Hydro. Code FType Name Area. Sq. Km Junction. ID * Edge. Type Flowline Shoreline Hydro. ID Hydro. Code Drain. ID Area. Sq. Km Junction. ID Next. Down. ID Simple. Junction. Feature Hydro. Edge Hydro. ID Hydro. Code Reach. Code Name Length. Km Length. Down Flow. Dir FType Edge. Type Enabled Watershed 1 Hydro. Network Hydro. Junction Hydro. ID Hydro. Code Next. Down. ID Length. Down Drain. Area FType Enabled Ancillary. Role 1 *
Variable attributes Cubic meters per second Flow m 3/s Variable. Name, e. g. discharge Variable. Code, e. g. NWIS: 0060 Sample. Medium, e. g. water Value. Type, e. g. field observation, laboratory sample Is. Regular, e. g. Yes for regular or No for intermittent Time. Support (averaging interval for observation) Data. Type, e. g. Continuous, Instantaneous, Categorical General. Category, e. g. Climate, Water Quality No. Data. Value, e. g. -9999
Scale issues in the interpretation of data The scale triplet b) Spacing length or time c) Support quantity a) Extent length or time From: Blöschl, G. , (1996), Scale and Scaling in Hydrology, Habilitationsschrift, Weiner Mitteilungen Wasser Abwasser Gewasser, Wien, 346 p.
The effect of sampling for measurement scales not commensurate with the process scale (a) spacing too large – noise (aliasing) (b) extent too small – trend (c) support too large – smoothing out From: Blöschl, G. , (1996), Scale and Scaling in Hydrology, Habilitationsschrift, Weiner Mitteilungen Wasser Abwasser Gewasser, Wien, 346 p.
Discharge, Stage, Concentration and Daily Average Example
Data Types • • • Continuous (Frequent sampling - fine spacing) Sporadic (Spot sampling - coarse spacing) Cumulative Incremental Average Maximum Minimum Constant over Interval Categorical
15 min Precipitation from NCDC Incomplete or Inexact daily total occurring. Value is not a true 24 -hour amount. One or more periods are missing and/or an accumulated amount has begun but not ended during the daily period.
Irregularly sampled groundwater level
Offset. Value Distance from a datum or control point at which an observation was made Offset. Type defines the type of offset, e. g. distance below water level, distance above ground surface, or distance from bank of river
Water Chemistry from a profile in a lake
Groups and Derived From Associations
Stage and Streamflow Example
Daily Average Discharge Example Daily Average Discharge Derived from 15 Minute Discharge Data
Methods and Samples Method specifies the method whereby an observation is measured, e. g. Streamflow using a V notch weir, TDS using a Hydrolab, sample collected in auto-sampler Sample. ID is used for observations based on the laboratory analysis of a physical sample and identifies the sample from which the observation was derived. This keys to a unique Lab. Sample. ID (e. g. bottle number) and name and description of the analytical method used by a processing lab.
Water Chemistry from Laboratory Sample
Value. Accuracy A numeric value that quantifies measurement accuracy defined as the nearness of a measurement to the standard or true value. This may be quantified as an average or root mean square error relative to the true value. Since the true value is not known this may should be estimated based on knowledge of the method and measurement instrument. Accuracy is distinct from precision which quantifies reproducibility, but does not refer to the standard or true value. Value. Accuracy Accurate Low Accuracy, but precise
Data Quality Qualifier Code and Description provides qualifying information about the observations, e. g. Estimated, Provisional, Derived, Holding time for analysis exceeded Quality. Control. Level records the level of quality control that the data has been subjected to. - Level 0. Raw Data - Level 1. Quality Controlled Data - Level 2. Derived Products - Level 3. Interpreted Products - Level 4. Knowledge Products
Series of Observations A “Data Series” is a set of all the observations of a particular variable at a site. The Series. Catalog is programmatically generated to provide users with the ability to do data discovery (i. e. what data is available and where) without formulating complex queries or hitting the Data. Values table which can get very large.
Outline • • HIS data publication system Water. ML and Water. One. Flow web services Observations data model (ODM) Data loading Data editing and quality control Controlled vocabularies HIS central registration and tagging
Loading data into ODM OD Data Loader • Interactive OD Data Loader (OD Loader) – Loads data from spreadsheets and comma separated tables in simple format SDL • Scheduled Data Loader (SDL) – Loads data from datalogger files on a prescribed schedule. – Interactive configuration • SQL Server Integration Services (SSIS) – Microsoft application accompanying SQL Server useful for programming complex loading or data management functions SSIS
Sensor Network Central Observations Database Internet Radio Repeaters Observations Database (ODM) Internet Base Station Computer Applications ODM Streaming Data Loader Remote Monitoring Sites From Jeff Horsburgh Data discovery, visualization, and analysis through Internet enabled applications
Loading the Little Bear Sensor Data Into ODM • Automate the data loading process via scheduled updates • Map datalogger files to the ODM schema and controlled vocabularies ODM Streaming Data Loader ODM SDL Mapping Wizard Streaming Data Text Files ODM SDL Import Application Base Station Computer(s) ODM XML Config File ODM SDL manages the periodic insertion of the streaming data into the ODM database using the mappings stored in the XML configuration file. From Jeff Horsburgh
3 Work from Out to In 5 7 At last … 1 2 6 And don’t forget … 4 CUAHSI Observations Data Model http: //www. cuahsi. org/his/odm. html
Managing Data Within ODM Tools • Query and export – export data series and metadata • Visualize – plot and summarize data series • Edit – delete, modify, adjust, interpolate, average, etc.
Outline • • HIS data publication system Water. ML and Water. One. Flow web services Observations data model (ODM) Data loading Data editing and quality control Controlled vocabularies HIS central registration and tagging
Syntactic Heterogeneity Multiple Data Sources With Multiple Formats Excel Files Text Files Access Files Data Logger Files ODM Observations Database From Jeff Horsburgh
Semantic Heterogeneity General Description of Attribute USGS NWISa EPA STORETb Code for location at which data are collected "site_no" "Station ID" Name of location at which data are collected "Site" OR "Gage" "Station Name" Code for measured variable "Parameter" ? c Name of measured variable "Description" "Characteristic Name" "datetime" "Activity Start" "agency_cd" "Org ID" Name of measured variable "Discharge" "Flow" Units of measured variable "cubic feet per second" "cfs" "2008 -01 -01" "2006 -04 -04 00: 00" "41° 44'36" "41. 7188889" "Spring, Estuary, Lake, Surface Water" "River/Stream" Structural Heterogeneity Time at which the observation was made Code that identifies the agency that collected the data Contextual Semantic Heterogeneity Time at which the observation was made Latitude of location at which data are collected Type of monitoring site United States Geological Survey National Water Information System (http: //waterdata. usgs. gov/nwis/). United States Environmental Protection Agency Storage and Retrieval System (http: //www. epa. gov/storet/). c An equivalent to the USGS parameter code does not exist in data retrieved from EPA STORET. a b From Jeff Horsburgh
Overcoming Semantic Heterogeneity • ODM Controlled Vocabulary System – ODM CV central database – Online submission and editing of CV terms – Web services for broadcasting CVs Variable Name Investigator 1: Investigator 2: Investigator 3: Investigator 4: “Temperature, water” “Water Temperature” “Temp. ” ODM Variable. Name. CV Term … Sunshine duration Temperature Turbidity … From Jeff Horsburgh
Dynamic controlled vocabulary moderation system ODM Data Manager ODM Website ODM Tools XML Local ODM Database Local Server ODM Controlled Vocabulary Moderator ODM Controlled Vocabulary Web Services http: //his. cuahsi. org/mastercvreg. html Master ODM Controlled Vocabulary From Jeff Horsburgh
Outline • • HIS data publication system Water. ML and Water. One. Flow web services Observations data model (ODM) Data loading Data editing and quality control Controlled vocabularies HIS central registration and tagging
Registering Web Services with HIS Central • Listing of all public data services • Enables applications like Hydroseek to discover data
Tagging Variables for Data Discovery Through a Metadata Catalog Ontology: A hierarchy of concepts Each Variable in your data is connected to a corresponding Concept From Michael Piasecki
Tagging variables in Ontology WATERS Network Information System Steps 1. The WSDL for a set of ODM web services is registered in the WSDL Registry 2. The “harvester” jumps into action and trawls through the web services at the WSDL to find and identify new variables 3. It returns i) data updating information and ii) variable names used and compares these to those used by Hydro. Seek. From Michael Piasecki 12/18/2021 Department of Civil, Architectural & Environmental Engineering 56
Mapping onto Ontology Steps contd. 4. New variables are manually mapped onto appropriate ontology concept. 5. Hydro. Seek catalogue is updated. From Michael Piasecki 12/18/2021 Department of Civil, Architectural & Environmental Engineering 57
Hydroseek http: //www. hydroseek. org Supports search by location and type of data across multiple observation networks including NWIS, Storet, and university data
Summary • Generic method for publishing observational data – Supports many types of point observational data – Overcomes syntactic and semantic heterogeneity using a standard data model and controlled vocabularies – Supports a national network of observatory test beds but can grow! • Web services provide programmatic machine access to data – Work with the data in your data analysis software of choice • Internet-based applications provide user interfaces for the data and geographic context for monitoring sites
- Slides: 59