USGS Landsat Migration to the Cloud 20 April

  • Slides: 13
Download presentation
USGS Landsat Migration to the Cloud 20 April 2021 Kristi Kline USGS U. S.

USGS Landsat Migration to the Cloud 20 April 2021 Kristi Kline USGS U. S. Department of the Interior U. S. Geological Survey 1

Landsat Cloud Project Scope • Modernize Processing, Access, and Distribution of Landsat Data •

Landsat Cloud Project Scope • Modernize Processing, Access, and Distribution of Landsat Data • Change from a primary business model of downloads to enabling access to the full archive • Enable users to interact with the data in an integrated environment • Ensure provenance and data stewardship • Key Project Objectives: • Establish an enterprise cloud environment for Landsat • Enable access to Collection 1 Level-1 and Level-2 in the cloud • • • Replicate Collection 1 Level-1 and establish operational data management procedures Demonstrate global scale production of Landsat data in the cloud through production of Level-2 products using a cloud framework Process Landsat archive in 1 -2 months rather than 9 -12 months • Establish modern access and visualization tools to access data Establish an Environment and System to Produce and Enable Landsat Collection 2 in the cloud Demonstrate key science use cases exploiting Landsat data U. S. Department of the Interior U. S. Geological Survey 2

Landsat Cloud Operational Concept View U. S. Department of the Interior U. S. Geological

Landsat Cloud Operational Concept View U. S. Department of the Interior U. S. Geological Survey 3

Web Enabled to Cloud Enabled • Next evolution of Landsat - transitioning to a

Web Enabled to Cloud Enabled • Next evolution of Landsat - transitioning to a Smart Cloud implementation • Continuation of free and open data policy • Enables opportunity for users to access data directly allowing: • • • Execution of algorithms directly on only the data needed Selective data usage (specify bands etc. for use) Reduced need for IT infrastructure U. S. Department of the Interior U. S. Geological Survey 4

Cloud Optimized Geo. TIFF Format • An enhanced Geo. TIFF with tiling and overviews

Cloud Optimized Geo. TIFF Format • An enhanced Geo. TIFF with tiling and overviews • Uses internal tiling instead of lines to speed access and support better remote reading • Downsampled overviews are generated when lower resolution data is acceptable • • No changes to the underlying pixels Stored in an unbundled format Data is internally compressed Enables HTTP Get Range requests U. S. Department of the Interior U. S. Geological Survey 5

Spatio. Temporal Asset Catalog (STAC) • New collaborative standard for managing access metadata •

Spatio. Temporal Asset Catalog (STAC) • New collaborative standard for managing access metadata • • • Open-source, headed by Planet Labs, freely available on Git. Hub, with Landsat extension • Flexibility to support many types of geospatial data (satellite, drone, radar, etc. ) • Allows for interoperability between satellite metadata Exposes data in a common, machine-readable JSON format for both end users and internal processes Includes direct links to S 3 objects or HTTP links Can be exploited through Jupyter Notebooks by end users to read data directly from the cloud without downloading Gaining wide adoption by the remote sensing community • i. e. , Government, International, Commercial, Academic STAC 1. 0. 0 specification (including Landsat extension) release expected on 26 April, 2021 U. S. Department of the Interior U. S. Geological Survey 6

Collection 2 Processing Architecture U. S. Department of the Interior U. S. Geological Survey

Collection 2 Processing Architecture U. S. Department of the Interior U. S. Geological Survey 7

Processing Metrics • • • Collection 2 processing took one month to complete (8/19/2020

Processing Metrics • • • Collection 2 processing took one month to complete (8/19/2020 – 9/19/2020) • • 8. 8 million scenes to Level 1 and Level 2 Average per day: 293, 000/day Began with small batches and increased to over 400, 000 scenes process per day to Level 1 and Level 2 Collection 1 processing at EROS in 2017 averages 25 -35, 000 scenes processed per day for only Level 1 products Previous Collection 1 processing took over 18 months (with some downtime between individual missions while code changes were completed) Docker images used to run the Image Processing containers AWS Spot EC 2 types: • • General purpose w/SSD (m 5 d) 4 xlarge to 24 xlarge Memory Optimized w/SSD (r 5 d) 4 xlarge to 24 xlarge AWS Batch and Step functions used for scheduling the jobs • • Overall, worked ok, but we did run into limits that in the environment Spot terminations caused some issues in the processing runs U. S. Department of the Interior U. S. Geological Survey 8

Collection 2 Access Architecture U. S. Department of the Interior U. S. Geological Survey

Collection 2 Access Architecture U. S. Department of the Interior U. S. Geological Survey 9

Collection 2 Egress Limiter • Implemented egress limiter to ensure control of costs •

Collection 2 Egress Limiter • Implemented egress limiter to ensure control of costs • Provide access to the data, but also implement controls on egress to defined limits • Egress limiter will not allow total throughput to exceed the predefined limit U. S. Department of the Interior U. S. Geological Survey 10

Distribution Metrics Collection 2 Data Use 500 450 Processing U. S. Analysis Ready Data

Distribution Metrics Collection 2 Data Use 500 450 Processing U. S. Analysis Ready Data in Cloud 400 Tera. Bytes 350 Changes to Web Application Firewall to increase throughput 300 250 200 150 100 50 C 2 Cloud Downloads (TB) U. S. Department of the Interior U. S. Geological Survey 11 C 2 On-Premise Downloads (TB) 1. 2 30 3. 02. 2 23 02 1 1 3. 16 . 2 02 1 Dec 2020 - March 2021 3. 20 2 9. 1 20 2 02. 2 2. 3. 1 1 23 2. 16 . 2 02 1 20 2 9. 1 20 2 2. 1 2. . 2 26 1. 02. 2 19 02 1 1 C 2 Cloud Direct Access (TB) 1. 12 . 2 02 1 1. 20 2 5. 1. 20. 2 9. 20 20 12 . 2 2. 20 20 20 5. . 1 12 0 02 12 . 2. 8 12 12 . 1 . 2 02 0 0

Collection 2 Schedule and Upcoming Activities • ORR #1 – Data Processing Readiness –

Collection 2 Schedule and Upcoming Activities • ORR #1 – Data Processing Readiness – July 2020 (complete) • ORR #2 – Public Data Availability – November 2020 (complete) • Collection 2 scenes released to the public on 1 December 2020 • ORR #3 – Enhanced Data Access – April 2021 • • • STAC 1. 0. 0 Sat. API Direct Access Metrics EROS Managed AWS WAF Public SNS Topic Cloud Inventory Report U. S. Department of the Interior U. S. Geological Survey 12

New Landsat. Look Tool U. S. Department of the Interior U. S. Geological Survey

New Landsat. Look Tool U. S. Department of the Interior U. S. Geological Survey 13