Data Management The European Data Grid Project Team

  • Slides: 31
Download presentation
Data Management The European Data. Grid Project Team http: //www. eu-datagrid. org

Data Management The European Data. Grid Project Team http: //www. eu-datagrid. org

Overview Ø Data Management Issues Ø Main Components n EDG Replica Catalog n EDG

Overview Ø Data Management Issues Ø Main Components n EDG Replica Catalog n EDG Replica Manager n GDMP EDG Data. Management Tutorial - n° 2

Data Management Issues EDG Data. Management Tutorial - n° 3

Data Management Issues EDG Data. Management Tutorial - n° 3

Data Management Issues EDG Data. Management Tutorial - n° 4

Data Management Issues EDG Data. Management Tutorial - n° 4

Data Management Tools Ø Ø Tools for n Locating data n Copying data n

Data Management Tools Ø Ø Tools for n Locating data n Copying data n Managing and replicating data n Meta Data management On EDG Testbed you have n EDG Replica catalog n globus-url-copy (Grid. FTP) n EDG Replica Manager n Grid Data Mirroring Package (GDMP) n Spitfire EDG Data. Management Tutorial - n° 5

EDG Replica Catalog Ø Based upon the Globus LDAP Replica Catalog Ø Stores LFN/PFN

EDG Replica Catalog Ø Based upon the Globus LDAP Replica Catalog Ø Stores LFN/PFN mappings and additional information (e. g. filesize): n Physical File Name (PFN): host + full path & and file name n Logical File Name (LFN): logical name that may be resolved to PFNs n LFN : PFN = 1 : n Ø Only files on storage elements may be registered Ø Each VO has a specific storage dir on an SE Ø Example PFN: lxshare 0222. cern. ch/flatfiles/SE 1/iteam/file 1. dat host Ø storage dir LFN must be full path of file starting from storage dir LFN of above PFN: file 1. dat EDG Data. Management Tutorial - n° 6

EDG Replica Catalog Ø API and command line tools n add. Logical. File. Name

EDG Replica Catalog Ø API and command line tools n add. Logical. File. Name n get. Logical. File. Name n delete. Logical. File. Name n get. Physical. File. Name n add. Physical. File. Name n delete. Physical. File. Name n add. Logical. File. Attribute n get. Logical. File. Attribute n delete. Logical. File. Attribute http: //cmsdoc. cern. ch/cms/grid/userguide/gdmp-3 -0/node 85. html EDG Data. Management Tutorial - n° 7

globus-url-copy Ø Low level tool for secure copying globus-url-copy <protocol>: //<source file>  <protocol>:

globus-url-copy Ø Low level tool for secure copying globus-url-copy <protocol>: //<source file> <protocol>: //<destination file> Ø Main Protocols: n gsiftp – for secure transfer, only available on SE and CE n file – for accessing files stored on the local file system on e. g. UI, WN globus-url-copy file: //`pwd`/file 1. dat gsiftp: //lxshare 0222. cern. ch/ flatfiles/SE 1/EDGTutorial/file 1. dat EDG Data. Management Tutorial - n° 8

The EDG Replica Manager Ø Extends the Globus replica manager Ø Only client side

The EDG Replica Manager Ø Extends the Globus replica manager Ø Only client side tool Ø Allows replication (copy) and registering of files in RC Ø Keeps RC consistent with stored data. EDG Data. Management Tutorial - n° 9

The Replica Manager APIs Ø (un)register. Entry(Logical. File. Name lfn, File. Name source) n

The Replica Manager APIs Ø (un)register. Entry(Logical. File. Name lfn, File. Name source) n Ø Replica Catalogue operations only - no file transfer copy. File(File. Name source, File. Name destination, String protocol) n allows for third-party transfer n transfer between: n § two Storage. Elements or § Computing. Element and Storage Element § Space management policies under development all tools support parallel streams for file transfers EDG Data. Management Tutorial - n° 10

The Replica Manager APIs Ø copy. And. Register. File(Logical. File. Name lfn, File. Name

The Replica Manager APIs Ø copy. And. Register. File(Logical. File. Name lfn, File. Name source, File. Name destination, String protocol) n third-party transfer but : files can only be registered in Replica Catalogue if destination PFN contains a valid SE (i. e. needs to be registered in the RC)! Ø replicate. File(Logical. File. Name lfn, File. Name source, File. Name destination, String protocol) Ø delete. File(Logical. File. Name lfn, File. Name source) EDG Data. Management Tutorial - n° 11

Ø Ø Ø based on CMS requirements for replicating Objectivity files for High Level

Ø Ø Ø based on CMS requirements for replicating Objectivity files for High Level Trigger studies production prototype project for evaluating Grid technologies (especially Globus) experience will directly be used in Data. Grid n Ø input also for PPDG and Gri. Phy. N http: //cern. ch/GDMP EDG Data. Management Tutorial - n° 12

Overview of Components Globus Replica Catalogue GDMP client Site 1 Site 2 Site 3

Overview of Components Globus Replica Catalogue GDMP client Site 1 Site 2 Site 3 EDG Data. Management Tutorial - n° 13

Subscription Model n All the sites that subscribe to a particular site get notified

Subscription Model n All the sites that subscribe to a particular site get notified whenever there is an update in its catalog. Site 1 Site 2 Subscriber list subscribe Site 3 EDG Data. Management Tutorial - n° 14

Export / Import Catalogue n Export Catalog § § n Import Catalog § §

Export / Import Catalogue n Export Catalog § § n Import Catalog § § n information about the new files produced. is published information about the files which have been published by other sites but not yet transferred locally As soon as the file is transferred locally, it is removed from the import catalogue. Site 1 Site 2 export catalog 1)register, publish new files 1) get info about new files import catalog 3) delete files Possible to pull the information about new files into your import catalogue. Site 3 2) transfer files EDG Data. Management Tutorial - n° 15

Usage Ø gdmp_ping n Ø gdmp_host_subscribe n Ø get/put all the files from the

Usage Ø gdmp_ping n Ø gdmp_host_subscribe n Ø get/put all the files from the import catalogue – update RC gdmp_remove_local_file n Ø send information of newly created files to subscribed hosts (no real data transfer) – update RC gdmp_replicate_get - gdmp_replicate_put n Ø Registers a file in local file catalogue but NOT in Replica Catalogue (RC) gdmp_publish_catalogue n Ø first thing to be done by a site gdmp_register_local_file n Ø Ping a GDMP server and get its status Delete a local file and update RC gdmp_get_catalogue n Get remote catalogue contents – for error recovery EDG Data. Management Tutorial - n° 16

Using GDMP Register all files in a directory at site 1 • gdmp_register_local_file –d

Using GDMP Register all files in a directory at site 1 • gdmp_register_local_file –d /data/files Site 2 Site 5 Site 1 Site 3 /data/files/file 1 /data/files/file 2 … Site 4 Data produced at site 1 to be replicated to other sites EDG Data. Management Tutorial - n° 17

Using GDMP 2 Ø Start with subscription n gdmp_host_subscribe –r <HOST> -p <PORT> Site

Using GDMP 2 Ø Start with subscription n gdmp_host_subscribe –r <HOST> -p <PORT> Site 5 Site 2 gdmp_host_subscribe Site 1 Site 3 Subscriber list gdmp_host_subscribe Site 4 EDG Data. Management Tutorial - n° 18

Using GDMP 3 Ø Publish new files…can combine with filtering n gdmp_publish_catalogue Site 2

Using GDMP 3 Ø Publish new files…can combine with filtering n gdmp_publish_catalogue Site 2 (might use filter option) Import catalog Export catalog Site 1 Site 5 Subscriber list gdmp_publish_catalogue Site 3 Import catalog Site 4 EDG Data. Management Tutorial - n° 19

Using GDMP 4 Ø Poll for change in catalog (pull model)…can combine with filtering…also

Using GDMP 4 Ø Poll for change in catalog (pull model)…can combine with filtering…also used for error recovery. n gdmp_get_catalogue –host <HOST> Site 2 Import catalog Export catalog Site 1 Site 5 Subscriber list gdmp_get_catalogue Site 3 Import catalog Site 4 EDG Data. Management Tutorial - n° 20

Using GDMP 5 Ø Transfer files…can use the progress meter n n n gdmp_replicate_get

Using GDMP 5 Ø Transfer files…can use the progress meter n n n gdmp_replicate_get get_progress_meter…produces a progress. log. replica. log has all files already transferred. Site 2 Import catalog Site 5 gdmp_replicate_get Export catalog Site 1 Subscriber list gdmp_replicate_get Site 3 Import catalog Site 4 EDG Data. Management Tutorial - n° 21

GDMP vs. EDG Replica Manager Ø GDMP Ø Replica Manager n Replicates sets of

GDMP vs. EDG Replica Manager Ø GDMP Ø Replica Manager n Replicates sets of files n n Replication between SEs n n Mass storage interface n File size as logical attribute n Subscription model n Event notification n CRC file size check n Support for Objectivity Replicates single files Replication between SEs, CEs to SE. EDG Data. Management Tutorial - n° 22

File Management Summary Site A Site B Storage Element A Storage Element B File

File Management Summary Site A Site B Storage Element A Storage Element B File Transfer File A File X File B File Y File A File C File B File D EDG Data. Management Tutorial - n° 23

File Management Summary Replica Catalog: Map Logical to Site files Site A Site B

File Management Summary Replica Catalog: Map Logical to Site files Site A Site B Storage Element A Storage Element B File Transfer File A File X File B File Y File A File C File B File D EDG Data. Management Tutorial - n° 24

File Management Summary Replica Catalog: Replica Selection: Map Logical to Site files Get ‘best’

File Management Summary Replica Catalog: Replica Selection: Map Logical to Site files Get ‘best’ file Site A Site B Storage Element A Storage Element B File Transfer File A File X File B File Y File A File C File B File D EDG Data. Management Tutorial - n° 25

File Management Summary Replica Catalog: Replica Selection: Map Logical to Site files Get ‘best’

File Management Summary Replica Catalog: Replica Selection: Map Logical to Site files Get ‘best’ file Pre- Post-processing: Prepare Site A files for transfer Validate files after transfer Site B Storage Element A Storage Element B File Transfer File A File X File B File Y File A File C File B File D EDG Data. Management Tutorial - n° 26

File Management Summary Replica Catalog: Replica Selection: Map Logical to Site files Get ‘best’

File Management Summary Replica Catalog: Replica Selection: Map Logical to Site files Get ‘best’ file Pre- Post-processing: Replication Automation: Prepare Site A files for transfer Validate files after transfer Data Source subscription Site B Storage Element A Storage Element B File Transfer File A File X File B File Y File A File C File B File D EDG Data. Management Tutorial - n° 27

File Management Summary Replica Catalog: Replica Selection: Map Logical to Site files Get ‘best’

File Management Summary Replica Catalog: Replica Selection: Map Logical to Site files Get ‘best’ file Pre- Post-processing: Replication Automation: Prepare Site A files for transfer Validate files after transfer Data Source subscription Site B Load balancing: Replicate based on usage Storage Element A Storage Element B File Transfer File A File X File B File Y File A File C File B File D EDG Data. Management Tutorial - n° 28

Replica Manager: ‘atomic’ File replication operation Management single client interface orchestrator Replica Catalog: Replica

Replica Manager: ‘atomic’ File replication operation Management single client interface orchestrator Replica Catalog: Replica Selection: Map Logical to Site files Get ‘best’ file Pre- Post-processing: Replication Automation: Prepare Site A files for transfer Validate files after transfer Data Source subscription Site B Load balancing: Replicate based on usage Storage Element A Storage Element B File Transfer File A File X File B File Y File A File C File B File D EDG Data. Management Tutorial - n° 29

Replica Manager: ‘atomic’ File replication operation Management single client interface orchestrator Replica Catalog: Replica

Replica Manager: ‘atomic’ File replication operation Management single client interface orchestrator Replica Catalog: Replica Selection: Map Logical to Site files Get ‘best’ file Pre- Post-processing: Replication Automation: Prepare Site A files for transfer Validate files after transfer Metadata: LFN metadata Storage Element A Transaction information Access patterns File A File X File B File Y Data Source subscription Site B Load balancing: Replicate based on usage Storage Element B File Transfer File A File C File B File D EDG Data. Management Tutorial - n° 30

Replica Manager: ‘atomic’ File replication operation Management single client interface orchestrator Replica Catalog: Replica

Replica Manager: ‘atomic’ File replication operation Management single client interface orchestrator Replica Catalog: Replica Selection: Map Logical to Site files Get ‘best’ file Pre- Post-processing: Replication Automation: Prepare Site A files for transfer Validate files after transfer Metadata: LFN metadata Storage Element A Transaction information Access patterns File A File X File B File Y Data Source subscription Site B Load balancing: Replicate based on usage Storage Element B File Transfer File A File C File B File D EDG Data. Management Tutorial - n° 31