The Earth System Grid ESG APAN e Science

  • Slides: 36
Download presentation
The Earth System Grid (ESG) APAN e. Science Workshop January 27, 2005 Don Middleton

The Earth System Grid (ESG) APAN e. Science Workshop January 27, 2005 Don Middleton On behalf of many project collaborators and a lot of great work! NCAR Scientific Computing Division Section Head, Visualization & Enabling Technologies

The ESG Collaboration LBNL: Climate storage facility ANL: Computational grids, & grid-based applications LLNL:

The ESG Collaboration LBNL: Climate storage facility ANL: Computational grids, & grid-based applications LLNL: Model diagnostics & inter-comparison USC/ISI: Computational grids, & grid-based applications NCAR: Climate change predication and scenarios LANL: High-resolution ocean models & computing ORNL: Climate storage & computational resources

A Global Coupled Climate Model

A Global Coupled Climate Model

A Lot of Data: Simulation Dataset Sizes by Resolution § T 42 CCSM (current,

A Lot of Data: Simulation Dataset Sizes by Resolution § T 42 CCSM (current, 280 km) § 7. 5 GB/yr, 100 years . 75 TB for one run § T 85 CCSM (140 km) § 29 GB/yr, 100 years 2. 9 TB for one run § T 170 CCSM (70 km) § 110 GB/yr, 100 years 11 TB for one run

CCM at T 170 Resolution

CCM at T 170 Resolution

Advances at the Earth Simulator ESC Climate Model at T 1279 (approx. 10 km)

Advances at the Earth Simulator ESC Climate Model at T 1279 (approx. 10 km)

We Will Examine Practically Every Aspect of the Earth System from Space in This

We Will Examine Practically Every Aspect of the Earth System from Space in This Decade Longer-term Missions - Observation of Key Earth System Interactions Aqua Terra Landsat 7 Quik. Scat Aura ICEsat Jason-1 Exploratory - Explore Specific Earth System Processes and Parameters and Demonstrate Technologies Triana GRACE VCL SRTM PICASSO Cloudsat EO-1 Courtesy of Tim Killeen, NCAR

IPCC

IPCC

The ESG Collaboration LBNL: Climate storage facility ANL: Computational grids, & grid-based applications LLNL:

The ESG Collaboration LBNL: Climate storage facility ANL: Computational grids, & grid-based applications LLNL: Model diagnostics & inter-comparison USC/ISI: Computational grids, & grid-based applications NCAR: Climate change predication and scenarios LANL: High-resolution ocean models & computing ORNL: Climate storage & computational resources

The Earth System Grid http: //www. earthsystemgrid. org § U. S. DOE Sci. DAC

The Earth System Grid http: //www. earthsystemgrid. org § U. S. DOE Sci. DAC funded R&D effort - a “Collaboratory Pilot Project” § Build an “Earth System Grid” that enables management, discovery, distributed access, processing, & analysis of distributed terascale climate research data § Build upon Globus Toolkit and Data. Grid technologies and deploy § Potential broad application to other areas

ESG People Team: PIs: § Ian Foster (ANL) § Don Middleton (NCAR) § Dean

ESG People Team: PIs: § Ian Foster (ANL) § Don Middleton (NCAR) § Dean Williams (LLNL) § § § § Veronika Nefedova (ANL) Luca Cincuini (NCAR) Gary Strand (NCAR) Peter Fox (NCAR) Jose Garcia (NCAR) Rob Markel (NCAR) Bob Drach (LLNL) David Bernholdt (ORNL) Kasidit Chanchio (ORNL) Line Pouchard (ORNL) Carl Kesselman (ISI) Ann Chervenak (ISI) Arie Shoshani (LBNL) Alex Sim (LBNL)

ESG: Challenges § Enabling the simulation and data management team § Enabling the core

ESG: Challenges § Enabling the simulation and data management team § Enabling the core research community in analyzing and visualizing results § Enabling broad multidisciplinary communities to access simulation results We need integrated scientific work environments that enable smooth WORKFLOW for knowledge development: computation, collaboration & collaboratories, data management, access, distribution, analysis, and visualization.

ESG: Strategies § Keep track of what we have, particularly what’s on deep storage

ESG: Strategies § Keep track of what we have, particularly what’s on deep storage § Metadata and Replica Catalogs § Move data a minimal amount, keep it close to computational point of origin when possible § Data access protocols, distributed analysis § When we must move data, do it fast and with a minimum amount of human intervention § Storage Resource Management, fast networks § Harness a federation of sites, web portals § Globus Toolkit -> The Earth System Grid -> The Ultra. Data. Grid

ESG Home

ESG Home

PCM Metadata

PCM Metadata

PCM Files and MSS

PCM Files and MSS

ESG CCSM

ESG CCSM

CCSM Datasets

CCSM Datasets

Subsetting List

Subsetting List

Subsetting Interface

Subsetting Interface

IPCC Data Click

IPCC Data Click

ESG architecture

ESG architecture

ESG topology

ESG topology

ESG Technologies: Security § Core security infrastructure provided by Globus GSI: digital certificates, public/private

ESG Technologies: Security § Core security infrastructure provided by Globus GSI: digital certificates, public/private keys, proxies § ESG web-based digital registration system: § Hides from user complex details of digital certificate generation § Allows easy web access by common users to ESG data services

ESG Technologies: Metadata § Collection-level description metadata (“climate metadata”) § Describes logical objects involved

ESG Technologies: Metadata § Collection-level description metadata (“climate metadata”) § Describes logical objects involved in climate modeling § Stored in set of relational tables in OGSA-DAI My. SQL database (RDBMS with Grid Service interface) § Input and output of database is XML § Location and replica metadata § Indicates the physical locations of the many copies of a single logical file § Stored in a system of distributed RLS (Replica Location Services): cross-updating grid-enabled My. SQL databases installed at each site § Any RLI in the system can be used as starting point for obtaining all replicas (at any site) of a given lfn

ESG Metadata Schema

ESG Metadata Schema

ESG Technologies: Metadata § THREDDS metadata catalogs: § Generated from collection-level + location/replica metadata

ESG Technologies: Metadata § THREDDS metadata catalogs: § Generated from collection-level + location/replica metadata § Nc. ML metadata: § Net. CDF specific § Describes specific content of each file § Used to create virtual dataset aggregations

ESG Technologies: Data Transport § SRM (Storage Resource Manager) § Middleware that allows seamless

ESG Technologies: Data Transport § SRM (Storage Resource Manager) § Middleware that allows seamless access to data resources whether they are stored on rotating or deep storage § File transfer between any deep storage (NCAR MSS, ORNL HPSS, NERSC) and local cache § Reliable, high performance transfer between sites via Grid. FTP § Robust, efficient cache management capabilities § OPe. NDAP-g § Integration of OPe. NDAP API with Globus technologies (GSI authentication and Grid. FTP data transfer) § Extension for aggregation of Net. CDF data

ESG Technologies: Web Portal § Main entry point into ESG system: provides simple, convenient

ESG Technologies: Web Portal § Main entry point into ESG system: provides simple, convenient web-based access to wide range of data services to access climate model data § Integrates and makes use of all other ESG technologies § Main ESG web portal at NCAR: gateway to distributed climate model datasets (PCM, CCSM data stored at NCAR, ORNL, NERSC, LLNL) § Same software under deployment by LLNL/PCMDI to serve locally stored IPCC data world wide

ESG Technologies: Aggregation/subsetting

ESG Technologies: Aggregation/subsetting

ESG Metrics (November 2004) § Community Climate System Model § 28. 4 Terabytes, including

ESG Metrics (November 2004) § Community Climate System Model § 28. 4 Terabytes, including 21 simulations, 141 datasets, and 289, 374 files § Parallel Climate Model § 20. 42 Terabytes, including 98 simulations, 434 datasets, and 44, 000 files § Total § 48. 8 Terabytes, 119 simulations, 575 datasets, in over 333, 872 files § 167 registrations, 132 approved, 154. 2 GB downloaded to date § Plus new IPCC Data § 150 user registrations, 1. 1 TB of data downloaded, in 16, 000 files

The Importance of Community: Collaborations & Relationships § § § GO-ESSP (multi-agency, intl. )

The Importance of Community: Collaborations & Relationships § § § GO-ESSP (multi-agency, intl. ) CCSM Data Management Group IPCC Globus Project OPe. NDAP/DODS (multi-agency) NCAR’s Community Data Portal (CDP) NSF National Science Digital Libraries Program (UCAR & Unidata THREDDS Project) U. K. e-Science & British Atmospheric Data Center NOAA NOMADS and CEOS-grid VSTO (new NSF/NMI-funded project) Other Sci. DAC Projects: Climate, Security & Policy for Group Collaboration, Scientific Data Management ISIC, & High-performance Data. Grid Toolkit

‘ing Our Data

‘ing Our Data

Data->Knowledge Mass Storage System (2. 0 PB) Establish new paradigms for managing and accessing

Data->Knowledge Mass Storage System (2. 0 PB) Establish new paradigms for managing and accessing scientific data based on semantic organization. Petascale Knowledge Environment

www. earthsystemgrid. org

www. earthsystemgrid. org