Lecture 4 Data Models Jeffery S Horsburgh Hydroinformatics

Lecture 4 Data Models Jeffery S. Horsburgh Hydroinformatics Fall 2013 1 of 51 This work was funded by National Science Foundation Grants EPS 1135482 and EPS 1208732

Objectives • Identify and describe important entities and relationships to model data • Describe important data models used in Hydrology such as the Observations Data Model (ODM), Arc. Hydro, and Net. CDF 2 of 51

What is a Data Model? • Abstract model that documents and organizes data • Explicitly provides the definition of and determines the structure of data • Used as a plan and structure for developing applications that use the data 3 of 51

Data Models • Define the “entity” types within a domain Values Methods (how) 4 of 51 Sites (where) Data Sources (who)

Entities Associated with Observations Variables – the things you measure or observe Observers – who made the observation Samples – a bottle of water, a sediment core Offsets – distance below ground, below surface, etc. • Versions – raw data, processed data, simulations • Qualifiers – limitations to data use • • 5 of 51

Data Models • Define the attributes of entities Attributes Values • • • Entity = Site 6 of 51 Site Name: Site Code: Latitude: Longitude: Elevation: State: County: Description: Site Type: Little Bear River near Wellsville USU-LBR-Wellsville 41. 643457 -111. 917649 1365 m Utah Cache Attached to SR 101 bridge. Stream

Data Models • Define the relationships among entities Variable and Method Source Site Values Water temperature values in degrees Celsius measured in the Little Bear River at Mendon Road using a Hydrolab MS 5 multiparameter sonde by Utah State University 7 of 51

Data Models • Define the “business rules” for data – Observations are recorded at one and only one site – One or more variables are measured at a site – A site must have a name – A variable name must be chosen from a controlled vocabulary 8 of 51

What are some types of data models? 9 of 51

Types of Data Models • Relational data models – e. g. , relational databases 1 1 * 10 of 51 *

Relational Data Models • Great for data with many transactions • Great in a multiple-user environment • Powerful query language – Structured Query Language (SQL) • Robust database servers and software tools available 11 of 51

Types of Data Models • File based data models – ESRI File Geodatabase – Net. CDF • Structured file or set of files that store data 12 of 51

File Based Data Models • Usually tied to a tool or set of tools for reading, writing, etc. • Can be portable across platforms • Can be optimized for performance or compression (e. g. , custom binary files) 13 of 51

Types of Data Models • Extensible Markup Language (XML) schemas 14 of 51

XML Schemas • Great for transporting data in a machine readable format • Platform and programming language independent • Special form of file based data model 15 of 51

Types of Data Models • Object models 16 of 51

Object Models • A collection of objects or classes through which a computer program can manipulate data • Objects have “properties” and “methods” • Container that wraps data within a set of functions – Ensure that the data are used appropriately – Provide standardized, reusable functionality 17 of 51

Object Model Class/Object Properties Methods 18 of 51

What are some common data models used in hydrology? 19 of 51

Some Data Models Commonly Used in Hydrology • • CUAHSI Observations Data Model (ODM) Arc Hydro Groundwater Net. CDF 20 of 51

Observations Data Model (ODM) Streamflow Precipitation & Climate Water Quality Groundwater levels Soil moisture data Flux tower data • A relational database at the single observation level • Metadata for unambiguous interpretation • Traceable heritage from raw measurements to usable information • Promote syntactic and semantic consistency • Cross dimension retrieval and analysis Horsburgh, J. S. , D. G. Tarboton, D. R. Maidment, and I. Zaslavsky (2008), A relational model for environmental and water resources data, Resources Research, 44, W 05406, doi: 10. 1029/2007 WR 006392. 21 Water of 51

What are the basic attributes to be associated with each single data value and how can these best be organized? Date. Time Interval (support) “When” t Variable Method Time, T A data value vi (s, t) s “Where” Space, S Quality Control Level vi Sample Medium “What” Variables, V Value Type Data Type Source/Organization 22 of 51 Units Accuracy Censoring Qualifying comments Location Feature of interest

Data Series – A Time Series of Hydrologic Observations Time Defined by unique combinations of: • Site • Variable • Method • Source • Quality Control Level End Date Time, t 2 Count, C Begin Date Time, t 1 Site, Sj Space Variable, Vi Variables 23 of 51 There are C measurements of Variable Vi at Site Sj from time t 1 to time t 2

ODM 1. 1. 1 Sources (who) Sites (where) Values + (when) 24 of 51 Variables (what) Methods (how) Quality Control Levels

Controlled Vocabularies 25 of 51

Controlled Vocabularies Reducing Semantic Heterogeneity 26 of 51

Implementing ODM • Relational database schemas exist for: – Microsoft SQL Server – My. SQL 27 of 51

ODM Example: Water Quality from a Profile in a Lake 28 of 51

Linking Point Observations to Hydrologic Features 29 of 51

Arc Hydro: GIS for Water Resources Published in 2002, now in revision for Arc Hydro II • Arc Hydro – An Arc. GIS data model for water resources – Arc Hydro toolset for implementation – Framework for linking hydrologic simulation models The Arc Hydro data model and application tools are in the public domain 30 of 51

Real World Hydrologic Features 31 of 51

What are some important entities in a data model for surface water hydrology? 32 of 51

Arc Hydro Framework Input Data Watersheds Waterbody Streams Hydro Points 33 of 51

Arc Hydro Framework Data Model 34 of 51

What Can I do with Arc. Hydro? Arc. Hydro defines flow lines and junctions and encodes flow directions 35 of 51 • Arc. Hydro encodes relationships among watersheds, streams, and junctions • Establishes hydrologic connectivity between polygon catchments (polygons), stream reaches (lines), and junctions (points)

What Can I Do with Arc. Hydro? Network Tracing Select all streams above a point 36 of 51 Select the downstream path for a point

Arc Hydro Tools for Arc. GIS • Terrain analysis: preparing DEM derivatives • Watershed processing: watershed delineation from DEMs • Attribute tools: computing and populating attributes and identifiers • Network tools: creating the hydro network Focus: getting data into Arc Hydro and working with it once it is there. 37 of 51

Arc Hydro Time Series • Variable: string describing what is being measured or calculated • Units: string describing units • Is. Regular: boolean inidicating if the data are regularly spaced • TSInterval: controlled vocabulary for time intervals • Data. Type: statistic for value measured over interval • Origin: indication of whether the values are measured or calculated 38 of 51

Arc Hydro Groundwater Data model and tools for managing groundwater data in Arc. GIS

What are important entities in a groundwater data model?

Arc Hydro GW Data Model

Arc Hydro GW Tools Groundwater Analyst MODFLOW Analyst Subsurface Analyst

Net. CDF • A platform independent format for representing multi-dimensional, array-orientated scientific data • Continuous space-time data model – Both time and space are varying • Especially useful for time-varying grids – Time varying precipitation fields (e. g. , radar rainfall data) • Used extensively in the weather and climate domains 43 of 51

Net. CDF Characteristics Net. CDF (network Common Data Form) • Self Describing - a net. CDF file includes information about the data it contains • Direct Access - a small subset of a large dataset may be accessed efficiently, without first reading through all the preceding data • Sharable - one writer and multiple readers may simultaneously access the same net. CDF file 44 of 51

Multidimensional Data Time = 3 Time = 2 Time = 1 45 of 51 http: //www. unidata. ucar. edu

Multidimensional Data – Space and Time 46 of 51

The Net. CDF File Net. CDF is a binary file A Net. CDF file consists of: Global Attributes: Describe the contents of the file Dimensions: Define the structure of the data (e. g. , Time, Depth, Latitude, Longitude) Variables: Holds the data in arrays shaped by Dimensions Variable Attributes: Describes the contents of each variable CDL (network Common Data form Language) description takes the following form net. CDF name { dimensions: . . . variables: . . . data: . . . } 47 of 51

Considerations in Modeling Data • Is there an existing data model that will work for my data? • What are the top 20 queries or analyses you need to do with the data? • What software do I want to use? • How will you want to share the data? 48 of 51

Advantages of Formal Data Models • Provide a high degree of structure to data • Generally implemented in software that has robust querying, manipulation, and visualization capabilities (e. g. , RDBMS or GIS) • Facilitate software development • Can help in capturing the semantics of data 49 of 51

Disadvantages • Can be stiff and difficult to change • Difficult to anticipate needs in the design stages • Can be incompatible across organizations • Can become complex 50 of 51

Summary (1) • A data model provides a definition of a formal structure for data • There are several flavors of data models, each with different strengths, weaknesses, and appropriate uses • Data models can facilitate software development 51 of 51

Summary (2) • Common data models used in hydrology – The CUAHSI Observations Data Model (ODM) provides an organizational structure for hydrologic time series data – Arc Hydro is a geographic data model for surface hydrologic features – Arc. Hydro Groundwater adds subsurface hydrologic features, geology, borehole data, and hydrostratigraphy – Net. CDF combines both geospatial and temporal domains into a continuous space-time data model 52 of 51

References and Credits Horsburgh, J. S. , D. G. Tarboton (2012). CUAHSI Community Observations Data Model (ODM) Version 1. 1. 1 Design Specifications, CUAHSI, Washington, D. C, http: //www. codeplex. com/Download? Project. Name=Hydro. Server&Download. Id=349176 Horsburgh, J. S. , D. G. Tarboton, D. R. Maidment, and I. Zaslavsky (2008), A relational model for environmental and water resources data, Water Resources Research, 44, W 05406, http: //dx. doi. org/10. 1029/2007 WR 006392. Maidment, D. R. (ed. ) (2002). Arc Hydro GIS for Water Resources, ESRI Press, Redlands, CA, 203 p. Strassberg, G. , N. L. Jones, D. R. Maidment (2011). Arc Hydro Groundwater GIS for Hydrogeology, ESRI Press, Redlands, CA, 160 p. Credits: Arc Hydro slides used with permission from David Maidment, University of Texas at Austin. Arc. Hydro Groundwater slides used with permission from Norm Jones, Brigham Young University/Aquaveo. 53 of 51
- Slides: 53