Life On The Edge Global Geoscience Data Delivery

  • Slides: 45
Download presentation
Life On The Edge: Global Geoscience Data Delivery Ollie Raymond with Nick Ardlie, Dale

Life On The Edge: Global Geoscience Data Delivery Ollie Raymond with Nick Ardlie, Dale Percival, Lesley Wyborn, and Aaron Sedgmen Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007

Outline • Living on the scientific fringe • A short history of digital geological

Outline • Living on the scientific fringe • A short history of digital geological map data standards at BMR-AGSO-GA • Customers, the web, and why we need digital data standards • Geo. Sci. ML and O&M - what are they? • Developing web services using data standards – Testbeds and living on the technological edge Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007

Data modellers and standards developers have always been regarded by geologists as a quaint

Data modellers and standards developers have always been regarded by geologists as a quaint lunatic fringe who are a bit of an annoyance for their important scientific research… …until those same geologists want to exchange their data with other geologists…. …and they spend the next week reformatting data from different sources. Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007

Life on the fringe is exacerbated by data modelling technobabble…. “Ontology” “The specification of

Life on the fringe is exacerbated by data modelling technobabble…. “Ontology” “The specification of one's conceptualisation of a knowledge domain” Que? Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007

“Ontology” • a set of controlled vocabularies (ie, lists of agreed terms) which describe

“Ontology” • a set of controlled vocabularies (ie, lists of agreed terms) which describe concepts in a field of interest eg, mineral names and lithology names describing rocks in Geology • the relationships between concepts and between the agreed terms used to describe those concepts eg, “geological units” are composed of “rocks” “granite” is a type of “felsic intrusive rock” • a set of rules about how to specify the terms and relationships Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007

A short history of GA’s digital geoscience map data standards… Pre 1990’s • Geological

A short history of GA’s digital geoscience map data standards… Pre 1990’s • Geological data and the standards that govern it have come a long way since the day of the old BMR cartographic symbols book • Last printed in 1989, the symbols book described the appearance, but rarely the meaning, of every line and symbol on a printed geological map Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007

A short history of GA’s digital geoscience map data standards… The 1990’s – the

A short history of GA’s digital geoscience map data standards… The 1990’s – the decade of GIS • BMR’s first proposed GIS data dictionary for geological map data was written in 1992 by a very young Robyn Gallagher when BMR realised that the new digital GIS products had no quality control • Some basic map data themes – geological unit polygons and boundaries – structures including faults, veins and dykes, and folds • It also described some point located datasets including – outcrop locations, structural measurements, geochemical analyses and mineral occurrences • some cartographic frames and graticules • less than 8 pages long Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007

A short history of GA’s digital geoscience map data standards… • The data dictionary

A short history of GA’s digital geoscience map data standards… • The data dictionary was extended over the next 5 years until AGSO merged with AUSLIG and we geologists were exposed to the much more rigorous standards definitions used by AUSLIG • The result was the GA Geoscience Data Dictionary for Spatial Data – 86 different spatial data themes – minerals, petroleum, regolith and marine geology and geophysics – mines, wells/drillholes, topography, urban, cultural and infrastructure themes – cartographic layers Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007

The customers… • They want the best and most current geoscience data • they

The customers… • They want the best and most current geoscience data • they want it free • and they want it NOW, 24 -7 • And they want to take GA’s and other Federal Govt data, the States’ data, CSIRO’s data, international data, and combine it with their own data • And they want to use all of this data in any number of 2 D and 3 D modelling and display software applications • Our software-specific, agency-specific data standards don’t cut the mustard when customers are trying to integrate data across jurisdictions Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007

Delivering Government Digital Geoscience Data The problem • access to Government geoscience information is

Delivering Government Digital Geoscience Data The problem • access to Government geoscience information is fragmented and inefficient Minerals Exploration Action Agenda … • • existing information is distributed across eight state and federal agencies each with its own information management systems and data formats up to 80% of time acquiring pre-competitive data is taken up by reformatting disparate data from government sources a disincentive to exploration Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007

Government Geoscience Online • 8 online geoscience delivery systems • 8 data structures •

Government Geoscience Online • 8 online geoscience delivery systems • 8 data structures • 2 proprietary (software-specific) data formats • cannot access more than one agency’s data at a time Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007

Rationalising data sources Label WA Age Description Life On The Edge - Global Geoscience

Rationalising data sources Label WA Age Description Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007 NT

Rationalising data sources WA ESRI Life On The Edge - Global Geoscience Data Delivery

Rationalising data sources WA ESRI Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007 NT MAPINFO

The problem • How do you get 8 Australian jurisdictions to provide digital geoscience

The problem • How do you get 8 Australian jurisdictions to provide digital geoscience map data in the same format? • You will never get them to agree to change their agency database structures to a single structure • You will never get them to agree to use the same software for data maintenance and delivery The solution • You CAN get them to agree on a software-independent DATA TRANSFER STANDARD Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007

What is a Digital Data Standard good for? • A common data structure in

What is a Digital Data Standard good for? • A common data structure in which you deliver your data, • is software-independent, • but most of all, a digital data standard enables …. Interoperability My stuff works with your stuff Lesley Wyborn, 2005 Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007

Geo. Sci. ML O&M Geo. Science Markup Language Observations and Measurements Life On The

Geo. Sci. ML O&M Geo. Science Markup Language Observations and Measurements Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007

Australia USA Committee for the Management and Application of Geoscience Information Canada UK Interoperability

Australia USA Committee for the Management and Application of Geoscience Information Canada UK Interoperability Working Group France Sweden Japan Italy Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007

CGI Interoperability Working Group geologists, geophysicists, information modellers, web programmers Life On The Edge

CGI Interoperability Working Group geologists, geophysicists, information modellers, web programmers Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007

What is Geo. Sci. ML? (Part 1) Geological Data Model • A logical data

What is Geo. Sci. ML? (Part 1) Geological Data Model • A logical data structure • a complex model (hierarchical, relational) • tells users what geological information goes where • and what terminology is to be used (vocabularies) • scientifically robust, developed by the scientific community • internationally agreed • data providers need only to “map” their own local data structures to the data transfer structure • data providers don’t need to change their local database structures to use the transfer standard Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007

The Geo. Sci. ML Data Model • • Constructed using UML (Unified Modelling Language)

The Geo. Sci. ML Data Model • • Constructed using UML (Unified Modelling Language) tools Presented as a series of class diagrams which show the attributes of and relationships between geological features and other data types e. g. Geologic(al) units • • unit types (eg, lithostratigraphic, chronostratigraphic) age and geological history (events) unit parts (child/parent relations) composition (earth materials) metamorphism weathering character physical properties related structures Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007

GEODX. STRATNAMES. TOPMINAGENAME GEODX. STRATNAMES. BASEMAXAGENAME “Mapping” your database to Geo. Sc. IML GEODX.

GEODX. STRATNAMES. TOPMINAGENAME GEODX. STRATNAMES. BASEMAXAGENAME “Mapping” your database to Geo. Sc. IML GEODX. STRATLITHS. LITHOLOGY SDE. CDI_VICSTRATS. FORMTYPE GEODX. RANKSYNONYMS. RANKNAME Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007

What is Geo. Sci. ML? (Part 2) XML encoding • the markup language used

What is Geo. Sci. ML? (Part 2) XML encoding • the markup language used to deliver the model to the internet • builds on established internet standards such as GML (Geographic Markup Language) • open source • software independent • machine readable Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007

<Rock> <gml: description>Medium to fine-grained lithic sandstone to siltstone</gml: description> <gml: name code. Space="http:

<Rock> <gml: description>Medium to fine-grained lithic sandstone to siltstone</gml: description> <gml: name code. Space="http: //www. ga. gov. au">Site 95846001 Rock #1</gml: name> <color> <CGI_Term. Value> <value code. Space="http: //www. ga. gov. au/Geological. Vocabs">grey-green</value> </CGI_Term. Value> </color> <composition. Category> <CGI_Term. Value> <value code. Space="http: //www. ga. gov. au/Geological. Vocabs">siliciclastic</value> </CGI_Term. Value> </composition. Category> <genetic. Category> <CGI_Term. Value> <value code. Space="http: //www. ga. gov. au/Geological. Vocabs">clastic sedimentary</value> </CGI_Term. Value> </genetic. Category> <particle. Geometry> <Particle. Geometry. Description> <size> <CGI_Value xsi: type="CGI_Term. Value. Type"> <value code. Space="http: //www. ga. gov. au/Geological. Vocabs">medium (1 -5 mm)</value> </CGI_Value> </size> </Particle. Geometry. Description> </particle. Geometry> <fabric> <Fabric. Description> <fabric. Type xlink: type="simple"> <Controlled. Concept gml: id="unique_ID_for_Normal. Grading"> <preferred. Name>normal grading</preferred. Name> <vocabulary xlink: type="simple" xlink: href="www. ga. gov. au/GA_fabric_vocabulary"/> </Controlled. Concept> </fabric. Type> </Fabric. Description>. . Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007

What is O&M? Like Geo. Sci. ML, it is a data model and GML

What is O&M? Like Geo. Sci. ML, it is a data model and GML schema • A data model for any type of scientific observation, measurement and sampling frame • It is a more generic model, not just for geoscience - it is less prescriptive than Geo. Sci. ML • It provides a platform on which individual science communities can build more domain-specific types of observation, measurement and sampling • For example, the Geo. Sci. ML working group have adopted the “sampling point” and “sampling curve” models of the O&M standard for geological use in delivering outcrop sample locations and boreholes Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007

What is O&M? • The O&M standard is more mature than Geo. Sci. ML

What is O&M? • The O&M standard is more mature than Geo. Sci. ML • It is nearing full ratification by the Open Geospatial Consortium • Two Australian members on the review panel - Simon Cox (CSIRO) senior editor, and Nick Ardlie (GA) • Aim to be submitted to ISO this year • Already used in GA in development of the Located Sample Data SPOT Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007

International Testbeds Testbed 1. 2005 - A borehole demonstrator between UK and France Testbed

International Testbeds Testbed 1. 2005 - A borehole demonstrator between UK and France Testbed 2. 2006 – A six nation demonstrator delivering geological map data from globally distributed sources using Geo. Sci. ML v 1. 1 • successfully demonstrated WMS/WFS delivery, display and download of distributed data sources and simple query functions • but lacked true interoperability between data sources • leading edge technology • suffered a little from immature and shifting standards in WMS/WFS, GML and Geo. Sci. ML • developing web services software • and no-one had done this before with such a complex data model Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007

Geo. Sci. ML Testbed 2 Accessing Geo. Sci. ML data using a web client

Geo. Sci. ML Testbed 2 Accessing Geo. Sci. ML data using a web client in Canada Vancouver, CA Keyworth, UK Ottawa, CA Uppsala, SV Orleans, FR Reston, VA Portland, OR Canberra, AU Geo. Sci. ML Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007

Geo. Sci. ML Testbed 2 (Canadian client) • Display data from distributed sources in

Geo. Sci. ML Testbed 2 (Canadian client) • Display data from distributed sources in a map and query a feature Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007

International Testbeds Testbed 3. • 2007/8 (in progress) An eight nation demonstrator using Geo.

International Testbeds Testbed 3. • 2007/8 (in progress) An eight nation demonstrator using Geo. Sci. ML v 2. 0 Aims: • to test true interoperability of WMS and WFS services using both the Geo. Sci. ML and O&M data standards • to test WFS query functionality within a complex data model • to test the ability of various software applications to consume data in Geo. Sci. ML format • to test registry services to discover and deliver geoscience information from distributed sources Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007

Geo. Sci. ML Testbed 3 Registry services Vancouver, CA Keyworth, UK Ottawa, CA Uppsala,

Geo. Sci. ML Testbed 3 Registry services Vancouver, CA Keyworth, UK Ottawa, CA Uppsala, SV REGISTRY Orleans, FR Reston, VA Portland, OR Japan • Italy Multilingual vocabularies • Map legends - (Styled. Layer. Descriptors) • Lists of available WMS and WFS services from distributed sources Canberra, AU Geo. Sci. ML Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007

Achieving Interoperability with Map Data Levels of Map Data Interoperability (after Brodaric, 2007) Map

Achieving Interoperability with Map Data Levels of Map Data Interoperability (after Brodaric, 2007) Map Data Service 1 Integrated Map Data Service 2 pragmatic Data Context (Geologist) pragmatic semantic Data Content (Vocabularies) semantic schematic Data Structure (Geo. Sci. ML, O&M) schematic syntax systems Data Language (GML) Data Services (WMS, WFS) Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007 syntax systems

Lessons Learnt from Testbed 2 The 3 most important things to consider in constructing

Lessons Learnt from Testbed 2 The 3 most important things to consider in constructing an interoperable web service testbed 1. compliance - to OGC web standards (WMS, WFS) 2. compliance - to the data model schema (Geo. Sci. ML) 3. compliance - to agreed vocabularies Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007

Semantic Interoperability • The Geo. Sci. ML data model contains much interpretive and textbased

Semantic Interoperability • The Geo. Sci. ML data model contains much interpretive and textbased data. There is not a large amount of relatively simple numerical data • This means that semantic compliance (ie, compliance to many controlled vocabularies) is not a trivial exercise • But compliance to vocabularies (eg, for Age) is crucial to be able to construct standardised WFS / WMS requests on distributed data • This became evident very quickly in Testbed 2 in trying to execute the agreed use cases - eg, select geologic features where Age = “xxx” Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007

Semantic Interoperability Cainozoic? Late? Early? Archaean? Palaeozoic? Life On The Edge - Global Geoscience

Semantic Interoperability Cainozoic? Late? Early? Archaean? Palaeozoic? Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007 Bolindian? Eastonian? Gisbornian?

Schematic Interoperability Flexibility in data representation • a feature of the Geo. Sci. ML

Schematic Interoperability Flexibility in data representation • a feature of the Geo. Sci. ML model allows representation of some data in different ways according to a user’s need eg, geologic age - single numeric value (eg: 455 Ma) - single defined text value (eg: Ordovician) - lower and upper value range (eg: 420 to 460 Ma; Silurian to Ordovician) Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007

Schematic Interoperability This pattern is flexible and entirely representative of how geologists use Age

Schematic Interoperability This pattern is flexible and entirely representative of how geologists use Age information, BUT…. • It is an issue for interoperability - how do you process a query on Age if the data in different datasets is in different, but still schema compliant, formats? Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007

Schematic Interoperability • Testbed 2 example of a WFS query on “age” Life On

Schematic Interoperability • Testbed 2 example of a WFS query on “age” Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007 • Client’s decision to query on “upper age” only

Schematic Interoperability + pragmatism • Geo. Sci. ML v 2. 0 now contains a

Schematic Interoperability + pragmatism • Geo. Sci. ML v 2. 0 now contains a preferred. Age attribute - a single value attribute designed purely to allow simpler and more straightforward queries on Age Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007

Software capabilities • Existing proprietary vendor software and open source software aims to support

Software capabilities • Existing proprietary vendor software and open source software aims to support the detail of OGC web service specifications (e. g. GML and complex features) …but they are still being developed • Much collaborative work was done with software developers during Testbed 2 to be able to serve the complex feature model needed for geological information Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007

Lessons learnt from “Life on the Edge” (a. k. a. Geo. Sci. ML Testbed

Lessons learnt from “Life on the Edge” (a. k. a. Geo. Sci. ML Testbed 2) • Highlighted both the capabilities and the limitations of Web Feature Service and OGC standards in a real-world, complex feature environment • Highlighted technical challenges for software developers and vendors to be able to deliver and consume OGC-compliant, complex feature WFS services Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007

Lessons learnt from “Life on the Edge” (a. k. a. Geo. Sci. ML Testbed

Lessons learnt from “Life on the Edge” (a. k. a. Geo. Sci. ML Testbed 2) • Highlighted the need to establish well-defined limits on use cases for any web data services. Unlimited interoperability of complex geoscience data is not realistic • Highlighted the importance of rigorous documentation of the data model to guide participants in a distributed network • Risk analysis at the pointy end of R&D testbed projects is crucial • Success in Testbed 3 is vital to achieve wide up-take of web services in the production environment Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007

Where to from here? One. Geology ~1: 1 million scale digital geology of the

Where to from here? One. Geology ~1: 1 million scale digital geology of the world over 50 nations on all continents Other Geoscience “ML’s” under development involving GA • • Landslides Mineral Occurrences Geochronology Geochemistry • many more that GA could be involved in Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007

Where to from here? Within Australia… An Australian Geoscience Portal ? • All government

Where to from here? Within Australia… An Australian Geoscience Portal ? • All government geoscience map data • Data served from distributed state and federal sites to a single portal using the Geo. Sci. ML and O&M data transfer standards Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007

Benefits of Web Services for Government Geoscience • Open data standards – not dependent

Benefits of Web Services for Government Geoscience • Open data standards – not dependent on proprietary software; internationally agreed • Efficiencies for industry – data from government providers is up-to-date, easily discoverable, and standard; no reformatting required • Efficiencies for government – no need to change local data structures; just map each database to Geo. Sci. ML – new data is immediately available to the internet as a web service – no need to maintain data in several different software formats – standard format for industry mandatory reporting • Benefits for the wider geoscience community – same methodologies used to develop the Geo. Sci. ML standard can be used by other scientific communities Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007

Questions Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct

Questions Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007