Data Integration Progress and Guiding Principles Disciplines generalization

  • Slides: 31
Download presentation
Data Integration Progress and Guiding Principles Disciplines, generalization, and open-access. David Blodgett – dblodgett@usgs.

Data Integration Progress and Guiding Principles Disciplines, generalization, and open-access. David Blodgett – dblodgett@usgs. gov USGS Office of Water Information Center for Integrated Data Analytics U. S. Department of the Interior U. S. Geological Survey

Outline · Data Integration Disambiguation · Barriers to moving Forward. · Anecdotes, everyone loves

Outline · Data Integration Disambiguation · Barriers to moving Forward. · Anecdotes, everyone loves anecdotes! · Principles to go Forward!

Disclosures · I’m a water guy. · I‘m a millennial. · I assume Internet.

Disclosures · I’m a water guy. · I‘m a millennial. · I assume Internet. · I’m a Badger. · … Forward!

Data Integration – Disambiguated. Integration is the act of combining multiple things into a

Data Integration – Disambiguated. Integration is the act of combining multiple things into a whole.

Data Integration – Disambiguated. What makes something integrated? How different do things need to

Data Integration – Disambiguated. What makes something integrated? How different do things need to be to count? Do you just need to combine things?

What kind of data integration is needed for decisions? Integrated Search Visual Integration Multi-source

What kind of data integration is needed for decisions? Integrated Search Visual Integration Multi-source Data Ingest Slide Credit: Jeff de La Beaujardiere Data Bundling Data Fusion Jeff. de. La. Beaujardiere@noaa. gov Application / Decision Driven Model of Data Integration Data Consolidation Warehouse 2014 -05 -12 in the. Data Cloud? 7

What kind of data integration is needed for decisions? Integrated Search Visual Integration Multi-source

What kind of data integration is needed for decisions? Integrated Search Visual Integration Multi-source Data Ingest Jeff. de. La. Beaujardiere@noaa. gov Data Bundling Data Fusion Data Consolidation Warehouse 2014 -05 -12 in the. Data Cloud? 8

A general model for data integration. Disciplinary Details Generalized Standards Free and Open Service

A general model for data integration. Disciplinary Details Generalized Standards Free and Open Service Access

Service Orientation On local machines, we run software. List, introspect, summarize, transform, integrate. Can

Service Orientation On local machines, we run software. List, introspect, summarize, transform, integrate. Can scan the entire domain of the data! A service may do any or all of these things. Software on the server can summarize the domain and range of its holdings. (ie. Deliver Dynamic Metadata)

Web Service – So what? Software on the server can summarize the domain and

Web Service – So what? Software on the server can summarize the domain and range of its holdings.

Generalized Aspects of Data Services International Standards. Spatial/ Temporal Extent Discipline specific linked to

Generalized Aspects of Data Services International Standards. Spatial/ Temporal Extent Discipline specific linked to other disciplines. Various Communities’ Interchange Blob of Bits Attribute Extent Available Formats

Practical Barriers ‘I don’t know how to use the required software. ’ ‘The software

Practical Barriers ‘I don’t know how to use the required software. ’ ‘The software I need is really expensive. ’ ‘The information I need is a big mess. ’ ‘The information I need is really big. ’

Understanding Barriers ‘The information is in a language I don’t know. ’ ‘The information

Understanding Barriers ‘The information is in a language I don’t know. ’ ‘The information is in a format I’ve never seen. ’ ‘The taxonomy used doesn’t work with mine. ’ ‘I’m not sure if what I’m seeing is a data quality issue or real. ’

Defensive Barriers ‘I collected this data and want to publish on it. ’ ‘People

Defensive Barriers ‘I collected this data and want to publish on it. ’ ‘People won’t interpret my data correctly. ’ ‘I don’t want to be liable for decisions made. ’ ‘This data’s quality is too low to stand behind. ’

Square Pegs and Round Holes Coverages and Features A grid cell IS NOT a

Square Pegs and Round Holes Coverages and Features A grid cell IS NOT a point measurement!!!

Scale Discontinuity

Scale Discontinuity

Anecdotes!. . . Because they are instructive!

Anecdotes!. . . Because they are instructive!

Water Quality Portal http: //www. waterqualitydata. us USGS, EPA, USDA Joint service providing water

Water Quality Portal http: //www. waterqualitydata. us USGS, EPA, USDA Joint service providing water quality and other environmental monitoring data.

Integrated Ocean Observing System

Integrated Ocean Observing System

Weather Underground 42 K Current Conditions Weather Stations!

Weather Underground 42 K Current Conditions Weather Stations!

Geo Data Portal Data Integration Framework Weather Common architecture for access and processing multiple

Geo Data Portal Data Integration Framework Weather Common architecture for access and processing multiple environmental data resources! Landscape Climate Center for Integrated Data Analytics: Nate Booth, Tom Kunicki, Dave Blodgett, Jordan Walker, Ivan Suftin, I-Lin Kuo.

Enabling Technologies….

Enabling Technologies….

____. data. gov – Big Win! Data access type is a first class citizen!

____. data. gov – Big Win! Data access type is a first class citizen! Includes both human and machine metadata. Machine-interpretability is an expectation. Content management systems and catalogs are becoming data service providers!!!

Forward!

Forward!

Principle #1: Data Object Patterns We must continue to identify and model the common

Principle #1: Data Object Patterns We must continue to identify and model the common patterns our data adhere to. Non-interpretive content / attributes should be provided by service ‘methods’. These patterns must transcend discipline or implementation.

Principle #2: Domain Semantics. Semantic relationships are necessarily governed by a given scientific domain

Principle #2: Domain Semantics. Semantic relationships are necessarily governed by a given scientific domain itself. This is Foundational to all additional interdisciplinary concerns.

Principle #3: __ - Agnostic Standards, specifications, and best practices must be ____ -

Principle #3: __ - Agnostic Standards, specifications, and best practices must be ____ - agnostic. A standard can be implemented using any technology, in any discipline. eg. Water. ML 2 -> Time. Series. ML

Principle #4: Identity Management Uniqueness can’t be taken for granted and must be curated

Principle #4: Identity Management Uniqueness can’t be taken for granted and must be curated very deliberately. You are not your location. Neither is a place. Foundational to linking any and all information to an entity.

A few thoughts to leave you with… Maps are metadata. Index-based data access is

A few thoughts to leave you with… Maps are metadata. Index-based data access is dead. A Geospatial database should be coherent without it’s spatial table.

Summary A standard is an established generalization. Scientific disciplines govern their semantics. Open-access (the

Summary A standard is an established generalization. Scientific disciplines govern their semantics. Open-access (the internet) must be a given.