Australian Geoscience Data Cube A Collaboration between Geoscience

  • Slides: 22
Download presentation
Australian Geoscience Data Cube A Collaboration between Geoscience Australia, CSIRO and NCI CEOS WGISS

Australian Geoscience Data Cube A Collaboration between Geoscience Australia, CSIRO and NCI CEOS WGISS 40 Simon Oliver – Geoscience Australia Robert Woodcock - CSIRO CEOS WGISS 40

Overview • Brief Review – Australian Geoscience Data Cube and CEOS • Update on

Overview • Brief Review – Australian Geoscience Data Cube and CEOS • Update on progress: – Open Source collaboration and wiki – Version 1 API – Version 2 Roadmap • multidimensional storage units • ingest support for multiple sensors • PC, Cloud, & HPC deployment • Analysis and production pipeline Provenance • The Future: Data Cubes and WGISS – Analysis Ready Data as an input to AGDC – Discrete Global Grid Systems - OGC SWG Current Status and implications for AGDC evolution – Discussion: Sentinel-2 / SPOT-5/6 prototype processing hub to ARD? . . . link to CWIC system? CEOS WGISS 40

Data-Intensive Quantitative Science The Australian Geoscience Data Cube (AGDC) : • Supports management and

Data-Intensive Quantitative Science The Australian Geoscience Data Cube (AGDC) : • Supports management and quantitative analysis of massive volumes of Earth observation (EO) and other geoscientific data. • Bring users to the data • Pixels as observations The AGDC to as a sensor-independent system for management, analysis and sharing of EO data CEOS WGISS 40

AGDC Overview A series of data structures and tools to enable efficient analysis of

AGDC Overview A series of data structures and tools to enable efficient analysis of large earth observation archives in HPC environments • Simple Data Structures – Spatially regular tiles – Managed by a relational database • Calibrated and Standardised Unique Observations – Surface Reflectance Observations • Quality Assured Observations – Flagged for cloud, cloud shadow, saturation and other quality indicators • Open source software • Analysis Ready Data CEOS WGISS 40

Landsat processing pipeline Analysis Ready Data preparation Data. Cube

Landsat processing pipeline Analysis Ready Data preparation Data. Cube

Update on Progress • Update on progress: – Open Source collaboration and wiki –

Update on Progress • Update on progress: – Open Source collaboration and wiki – Version 1 API – Version 2 Roadmap CEOS WGISS 40

Current Status • Partners: GA, CSIRO and the NCI • International collaborators are increasingly

Current Status • Partners: GA, CSIRO and the NCI • International collaborators are increasingly involved including the USGS, NASA and CEOS. – In the midst of moving to an updated version reflecting recent advances in technology – Supporting the establishment of other international data cubes, initially with Kenya and Colombia • AGDC is currently supporting a range of remote sensing applications across the water, vegetation and mineral domains • Providing valuable information for environmental monitoring and modelling across all Australian jurisdictions CEOS WGISS 40

WOf. S Summary Product Example • • Sum the derived temporal water stack: number

WOf. S Summary Product Example • • Sum the derived temporal water stack: number of water observations per pixel Sum the derived “real” observations for every pixel from the Pixel Quality Produce the ratio as a percentage for display WOf. S WMS Menindee Lakes as shown in WOf. S, with associated legend CEOS WGISS 40

Using tidal models to map tidal extents Tidal Range of >10 m Tidal Zone

Using tidal models to map tidal extents Tidal Range of >10 m Tidal Zone Extent Can be attributed with offsets of LAT to lowest observed tide and HAT to highest observed Tidal Zone Morphology Fraction of water observations over the time series. Can we attribute this with depths? CEOS WGISS 40

CSIRO Examples Vicarious calibration sites • Identify climatic zones, spatial and temporal variation, and

CSIRO Examples Vicarious calibration sites • Identify climatic zones, spatial and temporal variation, and seasonal suitability for calibration activities Landsat MODIS blending • Blend Landsat and MODIS scenes to produce Landsat-like data (25 m resolution) with MODIS repeat cycle (~ every 4 -days) Geoglam Rangelands • Remote sensing derived information on rangeland pasture cover and plant available water content for the globe CEOS WGISS 40

Open Data and Code AGDC Web: http: //www. datacube. org. au AGDC Wiki: http:

Open Data and Code AGDC Web: http: //www. datacube. org. au AGDC Wiki: http: //www. datacube. org. au/wiki/Main_Page Code repositories are available through Git. Hub: https: //github. com/data-cube Data is also available as individual files on the NCI THREDDS catalogue: http: //dap. nci. org. au/thredds/catalog. html CEOS WGISS 40

AGDC v 1 • HPC Deployment • Continental workflow support • Command line and

AGDC v 1 • HPC Deployment • Continental workflow support • Command line and python interface CEOS WGISS 40

AGDC v 1 - High Performance Computing • • Raijin @ National Computational Infrastructure

AGDC v 1 - High Performance Computing • • Raijin @ National Computational Infrastructure 57, 472 cores (2. 6 GHz) in 3592 compute nodes; 160 TBytes (approx. ) of main memory; 10 PBytes (approx. ) of usable fast file system (for short-term scratch space). And other CSIRO systems via managed replicas CEOS WGISS 40

AGDC v 1 API Applications • • • Bare Soil Landsat Clean Pixel Landsat

AGDC v 1 API Applications • • • Bare Soil Landsat Clean Pixel Landsat Median Mosaic Wetness in the landscape – Tasseled Cap Wetness Index Big Data for Environmental Monitoring Command line for non-python users

 • Calibration • Validation • Provenance • Versioning Platforms • Supply • Distribution

• Calibration • Validation • Provenance • Versioning Platforms • Supply • Distribution • Coordination • National • International Quality Community Earth Observation Informatics Platforms • Data storage • Data management • Analysis • Operations • PC • Cloud • HPC The growth issues will require fundamental changes to how EO research is done, and create massive opportunity CEOS WGISS 40

AGDC v 2 • • Multidimensional storage units Ingest support for multiple sensors PC,

AGDC v 2 • • Multidimensional storage units Ingest support for multiple sensors PC, Cloud, & HPC deployment Towards an Observation and Measurement approach to metadata Analysis and production pipeline Provenance UI Reference Implementation Apache v 2 License CEOS WGISS 40

The Future: Data Cubes and WGISS • The Future: Data Cubes and WGISS –

The Future: Data Cubes and WGISS • The Future: Data Cubes and WGISS – Analysis Ready Data as an input to Data Cubes – Discrete Global Grid Systems - OGC DGGS SWG Current Status and implications for Data Cube – Discussion: Sentinel-2 / SPOT-5/6 prototype processing hub to ARD? . . . link to CWIC system? CEOS WGISS 40

Analysis Ready Data • Analysis Ready Data (ARD) is satellite data that have been

Analysis Ready Data • Analysis Ready Data (ARD) is satellite data that have been processed and organized so users are not required to invest time and resources in specialized skills to apply corrections for: – instrument calibration (gains, offsets); – geolocation (spatial alignment); and – radiometry (solar illumination, incidence angle, topography, atmospheric interference). • In addition, ARD products are organized in a defined structure with associated metadata, quality flags and products. CEOS WGISS 40

Analysis Ready Data • Where is ARD processing best performed? • What is ARD

Analysis Ready Data • Where is ARD processing best performed? • What is ARD for different satellite/sensor types? e. g. SAR • Limitations in ARD, making choices to early? – Grids, resampling, resolutions…? – Corrections? CEOS WGISS 40

On Grids - A bit about DGGS • A DGGS is a spatial reference

On Grids - A bit about DGGS • A DGGS is a spatial reference system that uses a hierarchical tessellation of cells to partition and address the globe. • DGGS are characterized by the properties of their cell structure, geoencoding, quantization strategy and associated mathematical algorithm CEOS WGISS 40

OGC Discrete Global Grid Systems (DGGS) SWG • Sept/Oct 2015 – Candidate DGGS Core

OGC Discrete Global Grid Systems (DGGS) SWG • Sept/Oct 2015 – Candidate DGGS Core Standard Released for 30 day public comment period • Nov/Dec 2015 – OGC DGGS v 1. 0 Core Standard adopted by OGC (assuming no major issues raised during public comment period) • Beginning Jan 2016 – OGC DGGS SWG to begin elaboration of Extension Standards to the DGGS Core Standard • Anticipated Extensions include: – Interoperability interface protocols for OGC Web Services (e. g. WCS, WCPS, WCTiles, etc…) to facilitate DGGS-to-DGGS communication and processing – 3 D (and higher dimensional) DGGS Specifications – Best Practice Guide CEOS WGISS 40

Future: Possible WGISS involvement AWS - S 3 object storage study • The Future:

Future: Possible WGISS involvement AWS - S 3 object storage study • The Future: Data Cubes and WGISS – Analysis Ready Data as an input to AGDC – Discrete Global Grid Systems - OGC SWG Current Status and implications for AGDC evolution – Discussion: Sentinel-2 / SPOT-5/6 prototype processing hub to ARD? . . . link to CWIC system? CEOS WGISS 40