Integration of Grid and OGC Technologies for Earth































- Slides: 31
Integration of Grid and OGC Technologies for Earth Science Modeling and Applications Liping Di ldi@gmu. edu Aijun Chen achen 6@gmu. edu Laboratory for Advanced Information Technology and Standards George Mason University June 28, 2005 GGF 14 Chicago, IL LAITS Laboratory for Advanced Information Technology and Standards Page 1
Contents • • Introduction Why Grid is useful to Earth Observation Community OGC Technology for Interoperability Objective Areas of work Current Status Future work LAITS Laboratory for Advanced Information Technology and Standards Page 2
Introduction • Geospatial data is the major type of data that human beings has collected. – more than 80% of the data are geospatial data. • Image/gridded data is dominant form of geospatial data in terms of volume. – Most of those data are collected by the EO community. • Geospatial data will grow to ~exabytes very soon. – NASA EOSDIS has more than one petabyte of data in archives; more than 2 terabytes per day of new data are added. – Application data centers: 10’s of terabytes of imagery – Tens of thousands of datasets on-line now. • How to effectively, wisely, and easily use the geospatial data is the key information technology issue that we have to solve. LAITS Laboratory for Advanced Information Technology and Standards Page 3
The Grid Technology • The Grid technology is developed for securely sharing computational resources within an virtual organization. – – Computer CPU cycles Storage Networks Data, Information, algorithms, software, services. • It was originally motivated and supported from sciences and engineering requiring high-end computing, for sharing geographically distributed high-end computing resources. • The core of the technology is the open source middleware called Globus Toolkit. – The latest version of Globus is version 4. 0 which implements the Open Grid Service Architecture (OGSA) and converged with Web services technology. LAITS Laboratory for Advanced Information Technology and Standards Page 4
Why Grid Is Useful to the EO community? • Earth observation community is one of the key communities for collecting, managing, processing, archiving and distributing geospatial data and information. • Because of the large volumes of EO data and geographically scattered receiving and processing facilities, the EO data and associated computational resources are naturally distributed. • The multi-disciplinary nature of global change research and applications requires the integrated analysis of huge volume of multi-source data from multiple data centers. This requires sharing of both data and computing powers among data centers. • Therefore, Grid is an ideal technology for EO community. LAITS Laboratory for Advanced Information Technology and Standards Page 5
Why Needs the Geospatial Extensions of Grid • Geospatial data and information are significantly different from those in other disciplines. – Very complex and diverse. • Formats, projection, resolutions. • Hyper-dimensions: spatial, temporal, spectral, thematic. • Raster vs. vectors – Large data volume • more than 80% of data human beings has collected is spatial data. • The geospatial community has developed a set of standards specifically for geospatial data and information that users have been familiar with. (e. g. , OGC, ISO, FGDC). • Grid technology is developed for general sharing of computational resources and not aware of the specialty of geospatial data. • In order to make Grid technology applicable to geospatial data, we have to do the geospatial domain-specific extensions. LAITS Laboratory for Advanced Information Technology and Standards Page 6
The OGC Web Service Specifications • The Web Coverage Services (WCS) specification: defines the standard interfaces between web-based clients and servers for accessing coverage data. – All imagery type of remote sensing data is coverage data. • The Web Feature Services (WFS) specification: defines the standard interfaces between web-based clients and servers for accessing feature -based geospatial data. – vector and point data are feature data. • The Web Map Services (WMS) specification: define the standard interfaces for accessing and assembling maps from multiple servers. – visualization of geospatial data • • The Catalog Services for Web (CSW) specification: defines the interfaces between web-based clients and servers for finding the required data or services from registries. WCS, WFS, CSW, and WMS form the foundation for the interoperable geospatial data access and service environment LAITS Laboratory for Advanced Information Technology and Standards Page 7
Objectives of GMU Grid Project • Making NASA EOSDIS data easily accessible to Earth science modeling and applications communities by combining the advantages of both OGC and Grid technology – Develop the geospatial extensions of Grid technology to make it geospatially enabled (Geospatial Grid). – Enable OGC geospatial clients access Grid-managed distributed geospatial resources. – Provide virtual/intelligent geospatial products in the Grid environment. – Test methods for automating the process from geospatial data to knowledge in the Grid environment. • Demonstrate the geospatial Grid technology in realistic NASA EOS data environment. • Contribute technology, software, and the data pool application to the CEOS Grid testbed LAITS Laboratory for Advanced Information Technology and Standards Page 8
Areas of Extensions • Internally in the Grid, it have to be spatially aware – Extend Globus toolkit to handle the spatial, spectral, temporal, thematic based spatial data and information management. – Develop enough Grid-enable tools for geospatial data handling/serving. • Must provide data/information access and services interfaces that are standard in the geospatial community. – The Open GIS Consortium’s Web Data Access/Service interfaces (e. g. , OGC WCS, WMS, WFS, and CSW). LAITS Laboratory for Advanced Information Technology and Standards Page 9
Virtual Geospatial Datasets • A virtual dataset is a dataset that: – not exist in a data and information system – The system knows how to create it on-demand. – A virtual dataset, once created, can be kept for fulfilling the same request from next users. • The client/data user will not know the difference between a real dataset and a virtual dataset. • A virtual dataset can be produced (materialized) by – running a program dedicated to the production of the virtual dataset (dedicated program approach). – running a series of service modules, each one takes care of a small step of the materialization of the virtual dataset (service approach). LAITS Laboratory for Advanced Information Technology and Standards Page 10
The Service Approach to Virtual Datasets • A service is defined as self-contained, self-describing, modular applications that can be published, located, and dynamically invoked across a network. – It performs functions, which can be anything from simple requests to complicated business processes. – Once a service is deployed, other applications (and other services) can discover and invoke the deployed service. • A service can be implemented in the Web environment, called a web service, or in the Grid environment, called a Grid service. • Standards on service discovery, declaration, binding, and invocation allow dynamically chaining individual services across a network together to fulfill a complex task. • A virtual dataset, in the service environment, basically is a service chain that describes steps to be taken to produce the virtual dataset. • With enough elementary service models, it is possible to provide unlimited numbers of virtual datasets by just creating the service chains. LAITS Laboratory for Advanced Information Technology and Standards Page 11
Geo-object, Geo-tree, Virtual Dataset, Geospatial Models no service data service modeling and virtual data services User Requested User Obtained archived geo-object user geo-object Intermediate geo-object Geospatial web/Grid services Automated data transformation service(WCS/WFS) LAITS Laboratory for Advanced Information Technology and Standards Page 12
User Creation of Geospatial Models • A user-requested products maybe not exist both virtually and no virtually. • If the user knows the thought process to create the data products from lower-level inputs step-by-step (the logical geospatial modeling) – With help of a good user interface and the availability of service modules and models/submodels, the user can construct a geospatial model/virtual data product interactively. – The system then can produce the virtual data product for the user. – The user-created model can be incorporated into the system as a part of the virtual datasets the system can provide. • This allows the system to grow capabilities with time. • Advantages – allows users to obtain the ready-to-use scientific information instead of the raw data, significantly reducing the data traffic between the users and the geospatial Grid. – allows users to explore huge resources available at a data Grid and to conduct tasks that they never be able to conduct before. LAITS Laboratory for Advanced Information Technology and Standards Page 13
Current Status • We have fulfilled all objectives of the project except for the users-defined customizable virtual geospatial products. • A realistic testbed that simulates NASA EOS data environment has been created. • Grid-enabled OGC web services software have been developed. LAITS Laboratory for Advanced Information Technology and Standards Page 14
Grid Security (GSI) and VO Setup GMU (Solaris) (laits. gmu. edu) GT 3. 2 with CEOS Certs. NASA SGT (Linux) (arao 2. sgt-inc. com) LAITS CA center Globus 3. 2 with CEOS Certs. GMU (Mac) (geobrain. laits. gmu. edu) Globus 4. 0 with Laits Certs. NASA (Linux) (former. intl-interfaces. net) Globus 3. 0 with CEOS Certs. GMU (Linux) CEOS VO (salmon. laits. gmu. edu) Globus 4. 0 with Laits Certs. GMU (Linux) (data. laits. gmu. edu) Globus 4. 0 with Laits Certs. GMU LAITS VO Authentication among different VO IPG CA center ESG CA center Ames ipg 05 (Linux) (ipg 05. ipg. nasa. gov) LLNL esg 2 (Linux) (esg 2. llnl. gov) Globus 3. 2 with IPG Certs. Globus 3. 2 with ESG Certs. NASA IPG VO LLNL ESG VO LAITS Laboratory for Advanced Information Technology and Standards Page 15
The Testbed - Hardware • The testbed has been created with 7 machines in 3 organizations. • The flagship machine in the testbed is GMU’s Apple cluster server: – 6 Apple G 5 server nodes- 3 with dual 2. 5 GHz CPU and 3 with dual 2. 0 GHz CPU with total of 12 GB RAM. – 22. 6 TB RAID storage. – 1 GB network to Internet II and 100 MB to Internet I. – Hosted at ESDIS network lab of GSFC. LAITS Laboratory for Advanced Information Technology and Standards Page 16
The Testbed - Data • Populated the G 5 server with – Landsat data covering Globe for year 1975, 1990 and 2000. Currently total 7 TB of data has been ingested. – Shuttle DEM data covering Globe for year 2000. The size of DEM is about 1 TB – Other sample EOSDIS data (e. g. , MODIS, Aster, etc). • Converted part of DOE LLNL net. CDF modeling data to HDF-EOS data and loaded into the LLNL node of the testbed. • Replicated some typical EOSDIS data at NASA Ames node. • The total size of data in the testbed is around 9 TB now. LAITS Laboratory for Advanced Information Technology and Standards Page 17
The Testbed - Software • Globus 3. 2. 1 were installed at all nodes. • The geospatial Grid software developed by GMU has been installed at all nodes. • Setup and issue CAs – Set up LAITS CA, issued LAITS certificates to Mac machine and Linux machine of GMU LAITS. – Set up IPG CA, issued IPG certificates to Linux machine at NASA Ames. – From CEOS CA, requested CEOS certificates for Solaris machine at GMU LAITS. – Tested and debugged the authentication between any two different CAs’ certificates among all of the above boxes. LAITS Laboratory for Advanced Information Technology and Standards Page 18
Improvement of Catalog Information Model (IM) CSW Information Model ISO 19115 Part one ISO 19115 Part two (FGDC extension) HDF-EOS metadata IM (get. Capabilities and describe. Coverage) NASA ECS GCMD Service Type IM LAITS-defined Data Type IM ISO 19119 eb. RIM IM LAITS Laboratory for Advanced Information Technology and Standards Page 19
Software Development - GCSW 1. LAITS has developed an OGC Catalog Service for Web (CSW) server. 2. A wrapper has developed so that the catalog server can also work in the Grid environment as a service. 1. The Grid-enabled CSW 3. Deployed GCSW at Geo. Brain (Mac), LAITS (Solaris) and Data (Linux). LAITS Laboratory for Advanced Information Technology and Standards Page 20
Software Development - GWCS/Portal • OGC Web Coverage Service (WCS) specification is the fundamental specification for geospatial coverage data access • WCS has been enhanced to process 4 -D HDF-EOS data which is from LLNL net. CDF modeling data. • Enhanced the Grid enabled WCS (GWCS). • Developed OGC standard compatible WCS portal which is totally supported by Grid Services to access to GWCS. • Deployed and working on: Geobrain (Mac), Laits (Solaris) and Data (Linux). LAITS Laboratory for Advanced Information Technology and Standards Page 21
Software Development - GWMS/Portal • OGC Web Map Service (WMS) specification is the fundamental specification for geospatial map data access. • GMU has developed WMS server through other related project. • A wrapper on top of WMS server to make WMS server a complete Grid service. – It is called Grid-enabled Web Map Service (GWMS). – It works both as a web service and a Grid service. – Both OGC and Grid clients can work with the service. • Developed OGC standard compatible WMS portal which is totally supported by Grid Services to access to GWCS. • Deployed WMS at Geobrain (Mac), Laits (Solaris) and Data (Linux). LAITS Laboratory for Advanced Information Technology and Standards Page 22
Software Development - i. GSM (i. GSM: intelligent Grid Service Mediator) • Support WCS portal and WMS portal to distribute their request to proper GWCS and GWMS. WMS Portal WCS Portal GWMS i. GSM GWCS GCSW DTS ROS MDS LAITS Laboratory for Advanced Information Technology and Standards Page 23
Functional Overview of i. GSM • Managing geodata access requests from OGC WCS portal and WMS portal and transfer those requests to GWCS (Grid-enabled Web Coverage Service) or GWMS (Grid-enabled Web Map Service) – Accepts geodata requests from default WCS portal and WMS portal. – Queries a ROS (Replica Optimization Service) for an optimized PFNInfo (Physical File Name Information) object • Each PFNInfo contains a physical file name, a Grid. WCS service ID, and the host where the data file located – When the received PFNInfo contains a valid service ID • Requests a Grid. CSW (Grid-enabled Catalog Service for Web) for corresponding Grid. WCS/WMS URL to the service ID. – When the received PFNInfo contains a null service ID • Requests a Grid. CSW for available Grid. WCS(s) /WMS(s) among the Grid resources. • Requests a ROS (Replica and Optimized Service) for selecting the best Grid. WCS/WMS among the resources returned from the Grid. CSW • Requests a DTS (Data Transfer Service) for transferring the data to the selected system – Querying the Grid. CSW deployed in the selected system for the geodata URI LAITS Laboratory for Advanced Information Technology and Standards Page 24
Software Integration - ROS (ROS: Replica Optimization Service) ROS MDS Index Service (MDS) • Globus RLS as Grid Service RLI (Laits) LRC (Laits) RLI (Laits-data) LRC (Ames/LLNL) • Globus Index service • Globus MDS scripts modification LAITS Laboratory for Advanced Information Technology and Standards Page 25
Software Integration - DTS (DTS: Data Transfer Service) • Grid. FTP as Grid Service Machine A Globus Security Secure Request Data Machine A Globus Security Machine B Globus Security Secure Request Machine B Globus Security Data Machine C Globus Security LAITS Laboratory for Advanced Information Technology and Standards Page 26
Geospatial Grid with GCSW/GWCS/GWMS/i. GSM/ROS/DTS User/Client Interface (Web Download & MPGC) 1 CSW Portal 2 2 WCS Portal WMS Portal Laits (3) GWCS LLNL i. GSM GCSW GWMS ROS Geospatial Catalog DB Ames DTS MDS HDF-EOS Data Replica DB LAITS Laboratory for Advanced Information Technology and Standards Page 27
A Data Request Scenario based on the Integration Other WCS Client 3 + default WCS portal IP Retrieval Manager LAITS WCS/WMS Portal 1 CSW Portal 4 2 ESG Catalog i. GSM 2 8 Other Data 6 RLS 7 MDS HDF-EOS Data 9 5 Physical data/service ID LAITS Grid. CSW Ames DTS Logical data name ROS Best server ID LAITS Grid. WCS 9 9 LLNL Grid. WCS Ames Grid. WCS LAITS Laboratory for Advanced Information Technology and Standards Page 28
Major Work Items for the Next Year • The next stage of this project be concentrated on the on-demand virtual product generation based on geospatial processing models. – Determine the requirements of metadata structures for a materialized product catalogue; Develop the metadata for few examples of ES data products and use these to adapt or implement a prototype materialized product catalogue. – Directly access to data pools through ECHO, gateway to ESG; Design a representation of the transformation prescription and use it to develop a prototype virtual data product catalogue – Investigate the emerging workflow language BPEL 4 WS for representing the execution of the transformation prescriptions. Build the planner that converts from the prescriptions into a workflow language. – Make LAITS BPELPower workflow engine Grid enabled and integrate it into NWGISS. – Enable OGC CSW to search and OGC WCS to retrieve virtual data the same way as non-virtual data; Demonstrate an overall virtual data scenario. LAITS Laboratory for Advanced Information Technology and Standards Page 29
Acknowledgement • The project team includes Prof. Liping Di (PI, GMU), Dr. Piyush Mehrotra (Co-I, NASA Ames), Dr. Dean Williams (Co-I, DOE LLNL), Dr. Chaumin Hu (NASA Ames), Dr. Aijun Chen (implementation leader, GMU), Dr. Yuqi Bai (GMU), Mr. Yang Liu (GMU), Yaxing Wei (GMU). • The project is funded by NASA Advanced Information System Technology program (AIST). LAITS Laboratory for Advanced Information Technology and Standards Page 30
Acknowledgement Thank You for your attention ! Any Questions ? LAITS Laboratory for Advanced Information Technology and Standards Page 31