The EGI Data Hub a new Data as
The EGI Data. Hub: a new Data as a Service (Daa. S) Giuseppe LA ROCCA (EGI Foundation) Technical Outreach Expert giuseppe. larocca@egi. eu https: //documents. egi. eu/document/3177 www. egi. eu EGI-Engage is co-funded by the Horizon 2020 Framework Programme of the European Union under grant number 654142
Outline • • • Introductory concepts and driving considerations EGI Open Data Platform overview The EGI Data. Hub in a nutshell Hand-on Exercises Feedback form 10/17/2021 3 rd ENVRI week, November 14 – 18, Czech Republic 2
Open Science Commons Vision “Researchers from all disciplines have easy, integrated and open access to the advanced digital services, scientific instruments, data, knowledge and expertise they need to collaborate to achieve excellence in science, research and innovation. ” Open Science Commons paper 10/17/2021 3 rd ENVRI week, November 14 – 18, Czech Republic 3
Open Data challenges Availability Interoperability • Distributed, reliable storage, standard and easy protocols for accessing data, replicas • Data should be available in standard, interoperable, open formats Discovery • Data should be enriched with metadata, which discovery services and users can understand which can be indexed Identification • Data sets and items must have global unique identifiers which allow for their unambiguous referencing Provenance • Information on how the data was obtained or generated, in case of simulation data it should be possible to reproduce it Preservation • Data stored in long term retention archive should be usable after tens of years after creation 10/17/2021 3 rd ENVRI week, November 14 – 18, Czech Republic 4
Before we start • EGI Open Data Platform (ODP) – Support EC Open Data Cloud vision – Integrate different data repositories available in a distributed environment – Offer the functionalities to make data open and link them to Open Data Catalogues • Onedata – Software stack for distributed data management platform developed externally to EGI www. onedata. org 10/17/2021 3 rd ENVRI week, November 14 – 18, Czech Republic 5
Open Data Platform – Users’ perspective Single user interface for personal, research and open data management Users and community data is organized into spaces (virtual folders) ODP Non-Grid users friendly security – no VO certificate necessary for open data – EGI AAI 10/17/2021 Open data specific functionality including DOI registration, publication policies and long term preservation Web interface for data management, including ACL and sharing. Data can be accessed from local filesystem or Grid and Cloud protocols 3 rd ENVRI week, November 14 – 18, Czech Republic 6
Open Data Platform – Interfaces GUI REST Web based Easy data management and sharing, access control Publication of data items and collections 10/17/2021 Advanced data and collection management API for integration with community tools and portals CDMI POSIX Standard data management operations Advanced metadata queries Integration with future data management applications OAIPMH HTTP OAI Data Provider interface Enable direct mounting of spaces in the local filesystem without full data transfer Dublin Core metadata by default Direct download of open data from URL’s More complex metadata can be registered in ODP manually 3 rd ENVRI week, November 14 – 18, Czech Republic 7
The EGI Data. Hub in a nutshell • EGI Data. Hub is the central point of access for the Open Data Platform. – Makes existing large scale open data collections discoverable and available in an easy way for both EGI users and the general public – Supports fine-grained access policies 10/17/2021 3 rd ENVRI week, November 14 – 18, Czech Republic 8
One. Data: some basic concepts Spaces – distributed virtual volume where users can organize their data – Each space has to be supported by at least one Provider, which means that this provider reserve a certain storage quota for this particular space. – Spaces can be shared with other users and even exposed to the public. 10/17/2021 3 rd ENVRI week, November 14 – 18, Czech Republic 9
One. Data: some basic concepts Providers – entities who support spaces with storage resources – Any centre can become a provider by installing One. Provider service, attaching some resources and registering it in One. Zone service 10/17/2021 3 rd ENVRI week, November 14 – 18, Czech Republic 10
One. Data: some basic concepts Zones – federations of providers – Any organization, community or users group can deploy their own Onezone service – Onezone is responsible for authentication and authorization of users – It allows providers from different zones to interact with each others and share data 10/17/2021 3 rd ENVRI week, November 14 – 18, Czech Republic 11
One. Data: user interfaces User web interface User command line interface One. Data provides also the oneclient CLI 10/17/2021 3 rd ENVRI week, November 14 – 18, Czech Republic 12
Open Data Platform – The big picture EGI User 1 (VO x) Anonymous User 1 DOI Registrar (e. g. Data. Cite) EGI User 2 (Onedata space) Community Portal Anonymous User 2 REST Web GUI Space Manager POSIX Open Data Manager HTTP Metadata Registry CDMI OAI-PMH Data Provider REST Authentication and Authorization Open Data Platform EGI Site 1 10/17/2021 Long Term Retention Generatore AIP package for abc EGI Site 2 EGI Site 3 Cloud storage 3 rd ENVRI week, November 14 – 18, Czech Republic EUDAT 13
Accessing the EGI Training Infrastructure 10/17/2021 3 rd ENVRI week, November 14 – 18, Czech Republic 14
Before to start • Accessing the EGI Federated Cloud Clients: – UI Server 90. 147. 16. 130 Port 4422 – Username user. X, where X=1, . . , 39 – Password Fed. Cloud. User. X, where X=1, . . , 39 ~$ ssh user. X@90. 147. 16. 130 –p 4422 • Access the Central EGI Data. Hub: – https: //datahub. egi. eu • Access One. Providers: – 5 pre-defined providers already deployed in the EGI Training Infrastructure (147. 228. 242. 126 -130) – 5 groups to be formed – 1 provider assigned to each group 10/17/2021 3 rd ENVRI week, November 14 – 18, Czech Republic 15
The Infrastructure for this tutorial https: //www. egi. eu/services/training-infrastructure/ CYFRONET CESNET (Open. Nebula) UI INFN (Open. Stack) 10/17/2021 3 rd ENVRI week, November 14 – 18, Czech Republic 16
Exercise 1: Access Copernicus Data stored in the EGI Data. Hub In this exercise user will: • Login the EGI Data. Hub with EGI SSO or your Id. P • Access the Pl. Grid Provider • Download some Copernicus files 10/17/2021 3 rd ENVRI week, November 14 – 18, Czech Republic 17
Login the EGI Data. Hub • Access the EGI Data. Hub at https: //datahub. egi. eu Select EGI 10/17/2021 3 rd ENVRI week, November 14 – 18, Czech Republic 18
Login the EGI Data. Hub Choose EGI SSO or your IDP 10/17/2021 3 rd ENVRI week, November 14 – 18, Czech Republic 19
Login the EGI Data. Hub 10/17/2021 3 rd ENVRI week, November 14 – 18, Czech Republic 20
Login the EGI Data. Hub 10/17/2021 3 rd ENVRI week, November 14 – 18, Czech Republic 21
Access the Pl. Grid Provider 3. Select Go to your files 1. Select GO TO YOUR FILES 2. Select Pl. Grid provider 10/17/2021 3 rd ENVRI week, November 14 – 18, Czech Republic 22
Download Copernicus data Click on a file to download it 10/17/2021 3 rd ENVRI week, November 14 – 18, Czech Republic 23
Exercise 2: Create a new space in the EGI Data. Hub In this exercise user will: • Create a new space in the EGI Data. Hub • Support the new space with your provider 1. 2. 3. Get a token to support the new space Login into the web admin interface of your provider Support the new space with some storage • Upload some data on the new space through the web interface 10/17/2021 3 rd ENVRI week, November 14 – 18, Czech Republic 24
Create a new space Create new space 10/17/2021 Choose a name 3 rd ENVRI week, November 14 – 18, Czech Republic 25
Support the new space with your provider • Select the created space • Click on Get support and get the token 10/17/2021 3 rd ENVRI week, November 14 – 18, Czech Republic 26
Support the new space with your provider • Access the One. Provider administration interface via web: – https: //<One. Provider IP>: 9443 – 147. 228. 242. 126 -130 • Credentials – Username: admin – Password: password 10/17/2021 3 rd ENVRI week, November 14 – 18, Czech Republic 27
Support the new space with your provider Copy here. Support your token Select Space Select the amount of storage • NB: test provider can offer not more than 20 GB! • Your provider is now supporting the new space 10/17/2021 3 rd ENVRI week, November 14 – 18, Czech Republic 28
Upload data in the new space • Go back to the EGI Data. Hub web interface – The new space is now supported by your provider. 2. Select Go to your files in the map 1. Select your provider name under the space 10/17/2021 3 rd ENVRI week, November 14 – 18, Czech Republic 29
Upload data in the new space • You will be redirected to the One. Provider user interface – Select the new space 10/17/2021 3 rd ENVRI week, November 14 – 18, Czech Republic 30
Upload data in the new space • Upload some files 10/17/2021 3 rd ENVRI week, November 14 – 18, Czech Republic 31
Exercise 3: Access data with oneclient In this exercise user will: 1. Get an Access Token from the EGI Data. Hub web interface 2. Access the EGI Fed. Cloud UI via SSH 3. Mount the virtual file-system with One. Client 4. Create a new file in the new space 5. Unmount the virtual file-system 10/17/2021 3 rd ENVRI week, November 14 – 18, Czech Republic 32
Get an access token 1. Select Access Tokens 10/17/2021 2. Copy the Token 3 rd ENVRI week, November 14 – 18, Czech Republic 33
Log into the One. Provider • Access the EGI Fed. Cloud Client UI ~$ ssh user. X@90. 147. 16. 130 –p 4422 • Create a folder to mount the One. Data virtual filesystem and export the mount point ~$ export PROVIDER_HOSTNAME=<One. Provider_IP> ~$ export ONECLIENT_AUTHORIZATION_TOKEN=‘<Access Token>’ • Configure the One. Client settings: ~$ mkdir onedata ~$ export MOUNT_POINT=/home/user. X/onedata 10/17/2021 3 rd ENVRI week, November 14 – 18, Czech Republic 34
Mount the virtual file-system • Mount the virtual file-system ~$ oneclient --authentication token $MOUNT_POINT --nocheck-certificate Connecting to provider ‘<Your_Provider. ID>'. . . Getting configuration. . . oneclient has been successfully mounted in /home/user. X/onedata • Check the virtual file-system content ~$ cd onedata/ ubuntu@stoor 168: ~/onedata$ ls -la total 4 drwxr-xr-x 1 root 0 Sep drwxr-xr-x 7 ubuntu 4096 Sep drwxrwxr-x 1 root 1338303 0 Sep drwxrwxr-x 1 root 0 Sep 10/17/2021 29 29 09: 24 12: 26 11: 59 09: 28 . . . My. Space. X Sentinel 2 3 rd ENVRI week, November 14 – 18, Czech Republic 35
Work with the virtual file-system • Create a new file under your space ~/onedata$ cd My. Space. X/ user. X@stoor 168: ~/onedata/My. Space. X$ cat Test. One. Client. txt my first test with One. Client. • Go back to the web interface and download the new file 10/17/2021 3 rd ENVRI week, November 14 – 18, Czech Republic 36
Unmount the virtual file-system • Release the virtual file-system ~$ cd $HOME ~$ fusermount -u /home/user. X/onedata 10/17/2021 3 rd ENVRI week, November 14 – 18, Czech Republic 37
Exercise 4: Share your data In this exercise user will: 1. Generate a Space Token from the EGI Data. Hub web interface 2. Invite other users to access the data 10/17/2021 3 rd ENVRI week, November 14 – 18, Czech Republic 38
Share your data! • Go to the One. Provider interface and generate space join token 10/17/2021 3 rd ENVRI week, November 14 – 18, Czech Republic 39
Share your data! • Give it to another group who should join your space 10/17/2021 3 rd ENVRI week, November 14 – 18, Czech Republic 40
Thank you for your attention. Questions? PLEASE FILL IN THE FEEDBACK FORM! https: //www. surveymonkey. com/r/R 6 KMCDG www. egi. eu This work by Parties of the EGI-Engage Consortium is licensed under a Creative Commons Attribution 4. 0 International License.
- Slides: 41