www euegee org Grid Data Management Gabor Hermann

  • Slides: 57
Download presentation
www. eu-egee. org Grid Data Management Gabor Hermann on the base of lecture of

www. eu-egee. org Grid Data Management Gabor Hermann on the base of lecture of Simone Campana LCG Experiment Integration and Support CERN IT EGEE is a project funded by the European Union under contract IST-2003 -508833

Overview • Introduction on Data Management (DM) § § General Concepts Some details on

Overview • Introduction on Data Management (DM) § § General Concepts Some details on transport protocols Data management operations Files & replicas: Name Convention § § § Cataloging requirements and catalogs in egee/LCG RLS file catalog LCG file catalog § lcg_utils § § lcg_utils GFAL § § Advanced utilities: CLI&APIs Output. Data JDL attribute • File catalogs • DM tools: overview • Data Management CLI • Data Management API • Advanced concepts • Conclusions GRID’ 05 EGEE Summer School Budapest - 2

Overview • Introduction on Data Management (DM) § § General Concepts Some details on

Overview • Introduction on Data Management (DM) § § General Concepts Some details on transport protocols Data management operations Files & replicas: Name Convention § § § Cataloging requirements and catalogs in egee/LCG RLS file catalog LCG file catalog § lcg_utils § § lcg_utils GFAL § § Advanced utilities: CLI&APIs Output. Data JDL attribute • File catalogs • DM tools: overview • Data Management CLI • Data Management API • Advanced concepts • Conclusions GRID’ 05 EGEE Summer School Budapest - 3

Data Management: general concepts • What does “Data Management” mean? § § Users and

Data Management: general concepts • What does “Data Management” mean? § § Users and applications produce and require data Data may be stored in Grid files Granularity is at the “file” level (no data “structures”) Users and applications need to handle files on the Grid • Files are stored in appropriate permanent resources called “Storage Elements” (SE) § Present almost at every site together with computing resources § We will treat a storage element as a “black box” where we can store data • Appropriate data management utilities/services hide internal structure of SE • Appropriate data management utilities/services hide details on transfer protocols GRID’ 05 EGEE Summer School Budapest - 4

Data Management: general concepts • A Grid file is READ-ONLY (at least in egee/LCG)

Data Management: general concepts • A Grid file is READ-ONLY (at least in egee/LCG) § It can not be modified § It can be deleted (so it can be replaced) § Files are heterogeneous (ascii, binary …) • High level Data Management tools (lcg_utils, see later) hide § transport layer details (protocols …) § Storage location • To use lower level tools (edg-gridftp, see later ) you need § some knowledge of the transport layer § some knowledge of Storage Element implementation Historic Name: Europian Data Grid GRID’ 05 EGEE Summer School Budapest - 5

Some details on protocols • Data channel protocol: mostly grid. FTP (gsiftp) gsiftp §

Some details on protocols • Data channel protocol: mostly grid. FTP (gsiftp) gsiftp § secure and efficient data movement § extends the standard FTP protocol § Public-key-based Grid Security Infrastructure (GSI) support § Third-party control of data transfer § Parallel data transfer • Other protocols are available, especially for File I/O § rfio protocol: protocol • for CASTOR SE (and classic SE) • Not yet GSI enabled § gsidcap protocol: protocol • for secure access to d. Cache SE § file protocol: protocol • for local file access • Other Control Channel Protocols (SRM, discussed elsewere ) GRID’ 05 EGEE Summer School Budapest - 6

Data Management operations Upload a file to the grid • U ser need to

Data Management operations Upload a file to the grid • U ser need to store data in SE (from a U I) • Application need to store data in SE (from a CE) • U ser need to store the application (to be retrieved and run on a CE) CE SE Several Grid Components § For small files the Input. Sandbox can be used (see WMS lecture) UI GRID’ 05 EGEE Summer School Budapest - 7

Data Management operations Download files from the grid • User need to retrieve (onto

Data Management operations Download files from the grid • User need to retrieve (onto the UI) data stored into SE § For small files produced in WN CE SE the Output. Sandbox can be used (see WMS lecture) • Applications need to copy data locally (into the CE) and use them • The application itself must be downloaded onto the CE and run Several Grid Components UI GRID’ 05 EGEE Summer School Budapest - 8

Data Management operations Replicate a file across different SEs • Load share balacing of

Data Management operations Replicate a file across different SEs • Load share balacing of computing resources § Often a job needs to run at a site where a copy of input data is present (See Input. Data JDL attribute in WMS lecture) • Performance improvement in CE SE data access § Several applications might need to access the same file concurrently • Important for redundancy of key files (backup) Several Grid Components UI GRID’ 05 EGEE Summer School Budapest - 9

One of the base idea of LCG: Let us bring the little programs close

One of the base idea of LCG: Let us bring the little programs close to the big files Asymmetry in JDL: • In given situation it is the task of the user to copy the GRID files mentioned in Input Data to the CE • The JDL supports the creating of GRID files from local files via Output Data GRID’ 05 EGEE Summer School Budapest - 10

Data management operations • Data Management means movement and replication of files across/on grid

Data management operations • Data Management means movement and replication of files across/on grid elements • Grid DM tools/applications/services can be used for all kind of files HOWEVER • Data Management focuses on “large” files § § large means greater than ~20 MB Tipically on the order of few hundreds MB • Tools/applications/services are optimized to deal with large files • In many cases, small files can be efficiently treated using different procedures § Examples: • User can ship data to be used by the application on the WN (and possibly the application itself) using the Input. Sandbox (see WMS lecture) • User can retrieve (on the UI) data generated by a job (on the WN) using the Output. Sandbox (see WMS lecture) GRID’ 05 EGEE Summer School Budapest - 11

Files & replicas: Name Conventions • Logical File Name (LFN) § An alias created

Files & replicas: Name Conventions • Logical File Name (LFN) § An alias created by a user to refer to some item of data, e. g. “lfn: cms/20030203/run 2/track 1” • Globally Unique Identifier (GUID) § A non-human-readable unique identifier for an item of data, e. g. “guid: f 81 d 4 fae-7 dec-11 d 0 -a 765 -00 a 0 c 91 e 6 bf 6” • Site URL (SURL) § (or Physical File Name (PFN) or Site FN) The location of an actual piece of data on a storage system, e. g. “srm: //pcrd 24. cern. ch/flatfiles/cms/output 10_1” (SRM) “sfn: //lxshare 0209. cern. ch/data/alice/ntuples. dat” (Classic SE) • Transport URL (TURL) § Temporary locator of a replica + access protocol: understood by a SE, e. g. “rfio: //lxshare 0209. cern. ch//data/alice/ntuples. dat” Physical File SURL 1 Logical File Name 1 . . Logical File Name n GUID TURL 1 . . Physical File SURL n TURL n GRID’ 05 EGEE Summer School Budapest - 12

Overview • Introduction on Data Management (DM) § § General Concepts Some details on

Overview • Introduction on Data Management (DM) § § General Concepts Some details on transport protocols Data management operations Files & replicas: Name Convention § § § Cataloging requirements and catalogs in egee/LCG RLS file catalog LCG file catalog § lcg_utils § § lcg_utils GFAL § § Advanced utilities: CLI&APIs Output. Data JDL attribute • File catalogs • DM tools: overview • Data Management CLI • Data Management API • Advanced concepts • Conclusions GRID’ 05 EGEE Summer School Budapest - 13

File Catalogs At this point you should ask: 1) How do I keep track

File Catalogs At this point you should ask: 1) How do I keep track of all my files on the Grid? 2) Even if I remember all the lfns of my files, what about someone else files? 3) Anyway, how does the Grid keep track of associations lfn/GUID/surl? Well… we need a FILE CATALOGUE GRID’ 05 EGEE Summer School Budapest - 14

Cataloging Requirements • Need to keep track of the location of copies (replicas) of

Cataloging Requirements • Need to keep track of the location of copies (replicas) of Grid files • Replicas might be described by attributes § Support for METADATA § Could be “system” metadata or “user” metadata • Potentially, milions of files need to be registered and located § Requirement for performance • Distributed architecture might be desirable § scalability § prevent single-point of failure § Site managers need to change autonomously file locations GRID’ 05 EGEE Summer School Budapest - 15

File Catalogs in egee/LCG • Who has access to the file catalog? § The

File Catalogs in egee/LCG • Who has access to the file catalog? § The command line tools, APIs and the WMS interact with the catalog • Hide catalogue implementation details § Even lower level tools allow direct catalogue access • EDG’s Replica Location Service (RLS) § Catalogs in use in LCG-2 § Replica Metadata Catalog (RMC) + Local Replica Catalog (LRC) § Some performance problems detected during LCG Data Challenges • New LCG File Catalog (LCF) § Already being certified; deployment in January 2005 § Coexistence with RLS and migration tools provided § Better performance and scalability § Provides new features: security, hierarchical namespace, transactions. . . GRID’ 05 EGEE Summer School Budapest - 16

Overview of File catalogues GRID’ 05 EGEE Summer School Budapest - 17

Overview of File catalogues GRID’ 05 EGEE Summer School Budapest - 17

File Catalogs: The RLS • RMC: § Stores LFN-GUID mappings § Accessible by edg-rmc

File Catalogs: The RLS • RMC: § Stores LFN-GUID mappings § Accessible by edg-rmc CLI + API DM LRC RMC • LRC: § Stores GUID-SURL mappings § Accessible by edg-lrc CLI + API Logical File Name 1 Logical File Name 2 Logical File Name n RMC Physical File SURL 1 GUID Physical File SURL n LRC GRID’ 05 EGEE Summer School Budapest - 18

File Catalogs: The LFC • One single catalog • LFN acts as main key

File Catalogs: The LFC • One single catalog • LFN acts as main key in the database. It has: § § § Symbolic links to it (additional LFNs) Unique Identifier (GUID) System metadata User Metadata Information on replicas One field of user metadata User Defined Metadata LFN /grid/dteam/dir 1/dir 2/file 1. root Symlink /grid/dteam/mydir/mylink System Metadata “size” => 10234 “cksum_type” => “MD 5” “cksum” => “yy-yy-yy” GUID Xxxxxx-xxx- Replica srm: //host. example. com/foo/bar host. example. com GRID’ 05 EGEE Summer School Budapest - 19

File Catalogs: The LFC (II) • Fixes performance and scalability problems seen in EDG

File Catalogs: The LFC (II) • Fixes performance and scalability problems seen in EDG Catalogs § Cursors for large queries § Timeouts and retries from the client • Provides more features than the EDG Catalogs § User exposed transaction API (+ auto rollback on failure of mutating method § § call) Hierarchical namespace and namespace operations (for LFNs) /grid/<VO>/…. . Integrated GSI Authentication + Authorization Access Control Lists (Unix Permissions and POSIX ACLs) Checksums • Interaction with other components § Supports Oracle and My. SQL database backends § Integration with GFAL and lcg_util APIs complete § New specific API provided GRID’ 05 EGEE Summer School Budapest - 20

LFC commands Summary of the LFC Catalog commands lfc-chmod Change access mode of the

LFC commands Summary of the LFC Catalog commands lfc-chmod Change access mode of the LFC file/directory lfc-chown Change owner and group of the LFC file-directory lfc-delcomment Delete the comment associated with the file/directory lfc-getacl Get file/directory access control lists lfc-ln Make a symbolic link to a file/directory lfc-ls List file/directory entries in a directory lfc-mkdir Create a directory lfc-rename Rename a file/directory lfc-rm Remove a file/directory lfc-setacl Set file/directory access control lists lfc-setcomment Add/replace a comment GRID’ 05 EGEE Summer School Budapest - 21

LFC C API Low level methods (many POSIX-like): lfc_access lfc_deleteclass lfc_listreplica lfc_aborttrans lfc_delreplica lfc_lstat

LFC C API Low level methods (many POSIX-like): lfc_access lfc_deleteclass lfc_listreplica lfc_aborttrans lfc_delreplica lfc_lstat lfc_addreplica lfc_endtrans lfc_mkdir lfc_apiinit lfc_enterclass lfc_modifyclass lfc_chclass lfc_errmsg lfc_opendir lfc_chdir lfc_getacl lfc_queryclass lfc_chmod lfc_getcomment lfc_readdir lfc_chown lfc_getcwd lfc_readlink lfc_closedir lfc_getpath lfc_rename lfc_creat lfc_lchown lfc_rewind lfc_delcomment lfc_listclass lfc_rmdir lfc_delete lfc_listlinks lfc_selectsrvr lfc_setacl lfc_setatime lfc_setcomment lfc_seterrbuf lfc_setfsize lfc_starttrans lfc_stat lfc_symlink lfc_umask lfc_undelete lfc_unlink lfc_utime send 2 lfc GRID’ 05 EGEE Summer School Budapest - 22

 • Important environment variables: • export LCG_GFAL_INFOSYS=grid 152. kfki. hu: 2170 Must be

• Important environment variables: • export LCG_GFAL_INFOSYS=grid 152. kfki. hu: 2170 Must be set for each catalogue type • export LCG_CATALOG_TYPE=lfc Must be set only for LFC • export LFC_HOST=grid 155. kfki. hu Must be set only for LFC GRID’ 05 EGEE Summer School Budapest - 23

Overview • Introduction on Data Management (DM) § § General Concepts Some details on

Overview • Introduction on Data Management (DM) § § General Concepts Some details on transport protocols Data management operations Files & replicas: Name Convention § § § Cataloging requirements and catalogs in egee/LCG RLS file catalog LCG file catalog § lcg_utils § § lcg_utils GFAL § § Advanced utilities: CLI&APIs Output. Data JDL attribute • File catalogs • DM tools: overview • Data Management CLI • Data Management API • Advanced concepts • Conclusions GRID’ 05 EGEE Summer School Budapest - 24

DM CLIs & APIs overview User Tools Data Management (Replication, Indexing, Querying) lcg_utils: CLI

DM CLIs & APIs overview User Tools Data Management (Replication, Indexing, Querying) lcg_utils: CLI + C API edg-rm: CLI + API Cataloging GFAL C API Storage GFAL C API EDG LFC SRM edg-rmc edg-lrc CLI + API CLI+ API SRM API Classic SE File I/O Data transfer GFAL C API (GFAL C API) RFIO DCAP Grid. FTP bb. FTP rfio API dcap API edg- gridtp bb. FTP Globus API GRID’ 05 EGEE Summer School Budapest - 25

SRM Storage Management GRID’ 05 EGEE Summer School Budapest - 26

SRM Storage Management GRID’ 05 EGEE Summer School Budapest - 26

Data management tools • High level tools: Replica manager: lcg-* commands + lcg_* API

Data management tools • High level tools: Replica manager: lcg-* commands + lcg_* API § Provide (all) the functionality needed by the egee/LCG user § Combine file transfer and cataloging as an atomic transaction § Insure consistent operations on catalogues and storage systems § Offers high level layer over technology specific implementations § Based on the Grid File Access Library (GFAL) API • Low level tools: edg-gridftp tools: CLI GRID’ 05 EGEE Summer School Budapest - 27

DM CLIs & APIs: Old EDG tools • Old versions of EDG CLIs and

DM CLIs & APIs: Old EDG tools • Old versions of EDG CLIs and APIs still available • File & replica management § edg-rm • Implemented (mostly) in java • Catalog interaction (only for EDG catalogs) § edg-lrc § edg-rmc • Java and C++ APIs • Use discouraged § Worse performance (slower) § New features added only to lcg_utils § Less general than GFAL and lcg_utils GRID’ 05 EGEE Summer School Budapest - 28

Overview • Introduction on Data Management (DM) § § General Concepts Some details on

Overview • Introduction on Data Management (DM) § § General Concepts Some details on transport protocols Data management operations Files & replicas: Name Convention § § § Cataloging requirements and catalogs in egee/LCG RLS file catalog LCG file catalog § lcg_utils § § lcg_utils GFAL § § Advanced utilities: CLI&APIs Output. Data JDL attribute • File catalogs • DM tools: overview • Data Management CLI • Data Management API • Advanced concepts • Conclusions GRID’ 05 EGEE Summer School Budapest - 29

lcg-utils commands Replica Management lcg-cp Copies a grid file to a local destination lcg-cr

lcg-utils commands Replica Management lcg-cp Copies a grid file to a local destination lcg-cr Copies a file to a SE and registers the file in the catalog lcg-del Delete one file lcg-rep Replication between SEs and registration of the replica lcg-gt Gets the TURL for a given SURL and transfer protocol lcg-sd Sets file status to “Done” for a given SURL in a SRM request File Catalog Interaction lcg-aa Add an alias in LFC for a given GUID lcg-ra Remove an alias in LFC for a given GUID lcg-rf Registers in LFC a file placed in a SE lcg-uf Unregisters in LFC a file placed in a SE lcg-la Lists the alias for a given SURL, GUID or LFN lcg-lg Get the GUID for a given LFN or SURL lcg-lr Lists the replicas for a given GUID, SURL or LFN GRID’ 05 EGEE Summer School Budapest - 31

Gathering informations: lcg-infosites [scampana@grid 019: ~]$ lcg-infosites --vo gilda se ******************************* These are the

Gathering informations: lcg-infosites [[email protected] 019: ~]$ lcg-infosites --vo gilda se ******************************* These are the related data for gilda: (in terms of SE) ******************************* Avail Space(Kb) Used Space(Kb) SEs -----------------------------1570665704 576686868 grid 3. na. astro. it 225661244 1906716 grid 009. ct. infn. it 523094840 457000 grid 003. cecalc. ula. ve 1570665704 576686868 testbed 005. cnaf. infn. it 15853516 1879992 gilda-se 01. pd. infn. it GRID’ 05 EGEE Summer School Budapest - 32

lcg_utils CLI : usage example [scampana@grid 019: ~]$ lcg-lr --vo gilda lfn: simone-important [scampana@grid

lcg_utils CLI : usage example [[email protected] 019: ~]$ lcg-lr --vo gilda lfn: simone-important [[email protected] 019: ~]$ lcg-cr lcg-lr lcg-rep --vo gilda lfn: simone-important -l lfn: simone-important [[email protected] 019: ~]$ -l important-file. txt [[email protected] 019: ~]$lslcg-del --vo gilda -a lfn: simone-important -d -dgrid 003. cecalc. ula. ve grid 3. na. astro. it file: //`pwd`/important-file. txt lfn: simone-important -rw-r--r-1 scampana users 19 Oct 31 17: 09 important-file. txt sfn: //grid 003. cecalc. ula. ve/flatfiles/SE 00/gilda/generated/2004 -10 -31/ sfn: //grid 3. na. astro. it/flatfiles/SE 00/gilda/generated/2004 -10 -31/ [[email protected] 019: ~]$IMPORTANT lcg-lr --vo gilda lfn: simone-important file 39568 d 15 -e 873 -4 f 17 -9371 -b 8862 ae 77 c 36 guid: 08 d 02 e 56 -bdf 6 -4833 -a 4 da-e 0247 c 188242 file 4 c 7 c 2 ad 6 -4 d 93 -4 cd 2 -be 24 -bf 4239 f 58208 lcg_lr: No such file or directory sfn: //grid 3. na. astro. it/flatfiles/SE 00/gilda/generated/2004 -10 -31/ The lcg_utils (both CLI and API described later) need to access file 4 c 7 c 2 ad 6 -4 d 93 -4 cd 2 -be 24 -bf 4239 f 58208 the Information System (BDII). The name of the BDII host used by lcg_utils is specified in the environment variable LCG_GFAL_INFOSYS REMEMBER THAT, ESPECIALLY WHEN PERFORMING DATA MANAGEMENT OPERATIONS FROM THE WN Upload We the areplicate local file infile Naples our (Italy) UI Catania Delete replicas in the storage The …. have Let’ fileall is sthe effectively itinthere to Merida …in elements. now … GRID’ 05 EGEE Summer School Budapest - 33

Overview • Introduction on Data Management (DM) § § General Concepts Some details on

Overview • Introduction on Data Management (DM) § § General Concepts Some details on transport protocols Data management operations Files & replicas: Name Convention § § § Cataloging requirements and catalogs in egee/LCG RLS file catalog LCG file catalog § lcg_utils § § lcg_utils GFAL § § Advanced utilities: CLI&APIs Output. Data JDL attribute • File catalogs • DM tools: overview • Data Management CLI • Data Management API • Advanced concepts • Conclusions GRID’ 05 EGEE Summer School Budapest - 34

lcg_utils API • lcg_utils API: § High-level data management C API § Same functionality

lcg_utils API • lcg_utils API: § High-level data management C API § Same functionality as lcg_util command line tools • Single shared library § liblcg_util. so • Single header file § lcg_util. h (+ linking against libglobus_gass_copy_gcc 32. so) GRID’ 05 EGEE Summer School Budapest - 35

lcg_utils: Replica management int lcg_cp (char *src_file, char *dest_file, char *vo, int nbstreams, char

lcg_utils: Replica management int lcg_cp (char *src_file, char *dest_file, char *vo, int nbstreams, char * conf_file, int insecure); int lcg_cr (char *src_file, char *dest_file, char *guid, char *lfn, char *vo, char *relative_path, int nbstreams, char *conf_file, int insecure, int verbose, char *actual_guid); int lcg_del (char *file, int aflag, char *se, char *vo, char *conf_file, int insecure, int verbose); int lcg_rep (char *src_file, char *dest_file, char *vo, char *relative_path, int nbstreams, char *conf_file, int insecure, int verbose); int lcg_sd (char *surl, int regid, int fileid, char *token, int oflag); GRID’ 05 EGEE Summer School Budapest - 36

lcg_utils: Catalog interaction int lcg_aa (char *lfn, char *guid, char *vo, char *insecure, int

lcg_utils: Catalog interaction int lcg_aa (char *lfn, char *guid, char *vo, char *insecure, int verbose); int lcg_gt (char *surl, char *protocol, char **turl, int *regid, int *fileid, char **token); int lcg_la (char *file, char *vo, char *conf_file, int insecure, char ***lfns); int lcg_lg (char *lfn_or_surl, char *vo, char *conf_file, int insecure, char *guid); int lcg_lr (char *file, char *vo, char *conf_file, int insecure, char ***pfns); int lcg_ra (char *lfn, char *guid, char *vo, char *conf_file, int insecure); int lcg_rf (char *surl, char *guid, char *lfn, char *vo, char *conf_file, int insecure, int verbose, char *actual_guid); int lcg_uf (char *surl, char *guid, char *vo, char *conf_file, int insecure); GRID’ 05 EGEE Summer School Budapest - 37

Available APIs #include <iostream> #include <stdlib. h> #include <string> #include <stdio. h> #include <errno.

Available APIs #include <iostream> #include <stdlib. h> #include <string> #include <stdio. h> #include <errno. h> // lcg_util is a C library. Since we write C++ code here, we need to // use extern C // extern "C" { #include <lcg_util. h> } using namespace std; /****************************************/ /* The folling example code shows you how you can use the lcg_util API for */ /* replica management. We expect that you modify parts of this code in */ /* to make it work in your environment. This is particularly indicated */ /* by ACTION, i. e. your action is required. */ /****************************************/ int main () { cout << "Data Management API Example " << endl; char *vo = "cms"; // ACTION: fill in your correct VO here: gilda ! cout << "--------------------------" << endl; C APIs GRID’ 05 EGEE Summer School Budapest - 38

Available APIs // Copy a local file to the Storage Element and register it

Available APIs // Copy a local file to the Storage Element and register it in RLS // char *local. File = "file: /tmp/test-file"; // ACTION: create a testfile char *dest. SE = "lxb 0707. cern. ch"; // ACTION: fill in a specific SE char *actual. Guid = (char*) malloc(50); int verbose = 2; // we use verbosity level 2 int nbstreams = 8; // we use 8 parallel streams to transfer a file Copy and Register lcg_cr(local. File, dest. SE, NULL, vo, NULL, nbstreams, NULL, 0, verbose, actual. Guid); if (errno) { perror("Error in copy. And. Register: "); return -1; } else { cout << "We registered the file with GUID: " << actual. Guid << endl; } cout << "--------------------------" << endl; GRID’ 05 EGEE Summer School Budapest - 39

Available APIs // Call the list. Replicas (lcg_lr) method and print the returned URLs

Available APIs // Call the list. Replicas (lcg_lr) method and print the returned URLs // // The actual. Guid does not contain the prefix "guid: ". We add it here and // then use the new guid as a parameter to list replicas // std: : string guid = "guid: "; guid. insert(5, actual. Guid); char ***pfns = (char***) malloc(200); List Replicas lcg_lr((char*) guid. c_str(), vo, NULL, 0, pfns); if(errno) { perror("Error in list. Replicas: "); free(pfns); return -1; } else { cout << "PFN = " << **pfns << endl; } free(pfns); cout << "--------------------------" << endl; GRID’ 05 EGEE Summer School Budapest - 40

Available APIs // Delete the replica again // int rc = lcg_del((char*) guid. c_str(),

Available APIs // Delete the replica again // int rc = lcg_del((char*) guid. c_str(), 1, dest. SE, vo, NULL, 0, verbose); if(rc != 0) { perror("Error in delete: "); return -1; } else { cout << "Delete OK" << endl; } Delete Replica return 0; } GRID’ 05 EGEE Summer School Budapest - 41

Available APIs CC = g++ GLOBUS_FLAVOR = gcc 32 all: data-management: data-management. o $(CC)

Available APIs CC = g++ GLOBUS_FLAVOR = gcc 32 all: data-management: data-management. o $(CC) -o data-management -L${GLOBUS_LOCATION}/lib -lglobus_gass_copy_${GLOBUS_FLAVOR} -L${LCG_LOCATION}/lib -llcg_util -lgfal data-management. o Makefile used data-management. o: data-management. cpp $(CC) -I ${LCG_LOCATION}/include -c data-management. cpp clean: rm -rf data-management. o GRID’ 05 EGEE Summer School Budapest - 42

Overview • Introduction on Data Management (DM) § § General Concepts Some details on

Overview • Introduction on Data Management (DM) § § General Concepts Some details on transport protocols Data management operations Files & replicas: Name Convention § § § Cataloging requirements and catalogs in egee/LCG RLS file catalog LCG file catalog § lcg_utils § § lcg_utils GFAL § § Advanced utilities: CLI&APIs Output. Data JDL attribute • File catalogs • DM tools: overview • Data Management CLI • Data Management API • Advanced concepts • Conclusions GRID’ 05 EGEE Summer School Budapest - 43

Grid File Access Library • GFAL is a library to provide access to Grid

Grid File Access Library • GFAL is a library to provide access to Grid files § File I/O, Catalog Interaction, Storage Interaction • Abstraction from specific implementations • Transparent interaction with the information service, the file catalogs… • Single shared library in threaded and unthreaded versions § libgfal. so, libgfal_pthr. so Data Management (Replication, Indexing, Querying) • Single header file § gfal_api. h Cataloging EDG File I/O Storage LFC SRM Classic SE rfio dcap Data transfer gridftp RDT GRID’ 05 EGEE Summer School Budapest - 44

GFAL: Catalog API int create_alias (const char *guid, const char *lfn, long size) int

GFAL: Catalog API int create_alias (const char *guid, const char *lfn, long size) int guid_exists (const char *guid) char *guidforpfn (const char *surl) char *guidfromlfn (const char *lfn) char **lfnsforguid (const char *guid) int register_alias (const char *guid, const char *lfn) int register_pfn (const char *guid, const char *surl) int setfilesize (const char *surl, long size) char *surlfromguid (const char *guid) char **surlsfromguid (const char *guid) int unregister_alias (const char *guid, const char *lfn) int unregister_pfn (const char *guid, const char *surl) GRID’ 05 EGEE Summer School Budapest - 45

GFAL: Storage API int deletesurl (const char *surl) int getfilemd (const char *surl, struct

GFAL: Storage API int deletesurl (const char *surl) int getfilemd (const char *surl, struct stat 64 *statbuf) int set_xfer_done (const char *surl, int reqid, int fileid, char *token, int oflag) int set_xfer_running (const char *surl, int reqid, int fileid, char *token) char *turlfromsurl (const char *surl, char **protocols, int oflag, int *reqid, int *fileid, char **token) int srm_get (int nbfiles, char **surls, int nbprotocols, char **protocols, int *reqid, char **token, struct srm_filestatus **filestatuses) int srm_getstatus (int nbfiles, char **surls, int reqid, char *token, struct srm_filestatus **filestatuses) GRID’ 05 EGEE Summer School Budapest - 46

GFAL: File I/O API (I) int gfal_access (const char *path, int amode); int gfal_chmod

GFAL: File I/O API (I) int gfal_access (const char *path, int amode); int gfal_chmod (const char *path, mode_t mode); int gfal_close (int fd); int gfal_creat (const char *filename, mode_t mode); off_t gfal_lseek (int fd, off_t offset, int whence); int gfal_open (const char * filename, int flags, mode_t mode); ssize_t gfal_read (int fd, void *buf, size_t size); int gfal_rename (const char *old_name, const char *new_name); ssize_t gfal_setfilchg (int, const void *, size_t); int gfal_stat (const char *filename, struct stat *statbuf); int gfal_unlink (const char *filename); ssize_t gfal_write (int fd, const void *buf, size_t size); GRID’ 05 EGEE Summer School Budapest - 47

GFAL protocol of File Open GRID’ 05 EGEE Summer School Budapest - 48

GFAL protocol of File Open GRID’ 05 EGEE Summer School Budapest - 48

GFAL: File I/O API (II) int gfal_closedir (DIR *dirp); int gfal_mkdir (const char *dirname,

GFAL: File I/O API (II) int gfal_closedir (DIR *dirp); int gfal_mkdir (const char *dirname, mode_t mode); DIR *gfal_opendir (const char *dirname); struct dirent *gfal_readdir (DIR *dirp); int gfal_rmdir (const char *dirname); GRID’ 05 EGEE Summer School Budapest - 49

Overview • Introduction on Data Management (DM) § § General Concepts Some details on

Overview • Introduction on Data Management (DM) § § General Concepts Some details on transport protocols Data management operations Files & replicas: Name Convention § § § Cataloging requirements and catalogs in egee/LCG RLS file catalog LCG file catalog § lcg_utils § § lcg_utils GFAL § § Advanced utilities: CLI&APIs Output. Data JDL attribute • File catalogs • DM tools: overview • Data Management CLI • Data Management API • Advanced concepts • Conclusions GRID’ 05 EGEE Summer School Budapest - 50

Advanced utilities: edg-gridftp Used for low level management of file/directories in SEs edg-gridftp-exists TURL

Advanced utilities: edg-gridftp Used for low level management of file/directories in SEs edg-gridftp-exists TURL Checks if file/dir exists on a SE edg-gridftp-ls TURL Lists a directory on a SE globus-url-copy src. TURL dst. TURL Copies files between SEs edg-gridftp-mkdir TURL Creates a directory on a SE edg-gridftp-rename src. TURL dst. TURL Renames a file on a SE edg-gridftp-rm TURL Removes a file from a SE edg-gridftp-rmdir TURL Removes a directory on a SE GRID’ 05 EGEE Summer School Budapest - 51

edg-gridftp example Create and delete a directory in a GILDA Storage Element GRID’ 05

edg-gridftp example Create and delete a directory in a GILDA Storage Element GRID’ 05 EGEE Summer School Budapest - 52

Other Advanced CLI&API • globus-url-copy src. TURL dest. TURL § low level file transfer

Other Advanced CLI&API • globus-url-copy src. TURL dest. TURL § low level file transfer • Interaction with RLS components § edg-lrc command (actions on LRC) § edg-rmc command (actions on RMC) § C++ and Java API for all catalog operations • http: //edg-wp 2. web. cern. ch/edg-wp 2/replication/docu/r 2. 1/edg-lrc-devguide. pdf • http: //edg-wp 2. web. cern. ch/edg-wp 2/replication/docu/r 2. 1/edg-rmc-devguide. pdf • Using low level CLI and API is STRONGLY discouraged § Risk: loose consistency between SEs and catalogues § REMEMBER: REMEMBER a file is in Grid if it is BOTH: • stored in a Storage Element • registered in the file catalog GRID’ 05 EGEE Summer School Budapest - 53

Overview • Introduction on Data Management (DM) § § General Concepts Some details on

Overview • Introduction on Data Management (DM) § § General Concepts Some details on transport protocols Data management operations Files & replicas: Name Convention § § § Cataloging requirements and catalogs in egee/LCG RLS file catalog LCG file catalog § lcg_utils § § lcg_utils GFAL § § Advanced utilities: CLI&APIs Output. Data JDL attribute • File catalogs • DM tools: overview • Data Management CLI • Data Management API • Advanced concepts • Conclusions GRID’ 05 EGEE Summer School Budapest - 55

Summary • We provided a description to the egee/LCG Data Management Middleware Components and

Summary • We provided a description to the egee/LCG Data Management Middleware Components and Tools • We described how to use the available CLIs • Use-case scenarios of Data Movement on Grid • We presented the available APIs • An example usage of lcg_util library is shown GRID’ 05 EGEE Summer School Budapest - 56

Bibliography • General egee/LCG information § EGEE Homepage http: //public. eu-egee. org/ § EGEE’s

Bibliography • General egee/LCG information § EGEE Homepage http: //public. eu-egee. org/ § EGEE’s NA 3: User Training and Induction http: //www. egee. nesc. ac. uk/ § LCG Homepage http: //lcg. web. cern. ch/LCG/ § LCG-2 User Guide https: //edms. cern. ch/file/454439//LCG-2 -User. Guide. html § GILDA http: //gilda. ct. infn. it/ § GENIUS (GILDA web portal) http: //grid-tutor. ct. infn. it/ GRID’ 05 EGEE Summer School Budapest - 57

Bibliography • Information on Data Management middleware § LCG-2 User Guide (chapters 3 rd

Bibliography • Information on Data Management middleware § LCG-2 User Guide (chapters 3 rd and 6 th) https: //edms. cern. ch/file/454439//LCG-2 -User. Guide. html § Evolution of LCG-2 Data Management. J-P Baud, James Casey. http: //indico. cern. ch/contribution. Display. py? contrib. Id=278&session. Id=7& conf. Id=0 § Globus 2. 4 http: //www. globus. org/gt 2. 4/ § Grid. FTP http: //www. globus. org/datagrid/gridftp. html § GFAL http: //grid-deployment. web. cern. ch/griddeployment/gis/GFALindex. html GRID’ 05 EGEE Summer School Budapest - 58

Bibliography • Information on egee/LCG tools and APIs § Manpages (in UI) • lcg_utils:

Bibliography • Information on egee/LCG tools and APIs § Manpages (in UI) • lcg_utils: lcg-* (commands), lcg_* (C functions) § Header files (in $LCG_LOCATION/include) • lcg_util. h § CVS developement (sources for commands) http: //isscvs. cern. ch: 8180/cgi-bin/cvsweb. cgi/? hidenonreadable=1&f=u& logsort=date&sortby=file&hideattic=1&cvsroot=lcgware&path= • Information on other tools and APIs § EDG CLIs and APIs http: //edg-wp 2. web. cern. ch/edg-wp 2/replication/documentation. html § Globus http: //www-unix. globus. org/api/c/ , . . . globus_ftp_client/html , . . . globus_ftp_control/html GRID’ 05 EGEE Summer School Budapest - 59