Part Four The LSC Data Grid Part Four

  • Slides: 24
Download presentation

Part Four: The LSC Data. Grid

Part Four: The LSC Data. Grid

Part Four: LSC Data. Grid • A: Data Replication • B: What is the

Part Four: LSC Data. Grid • A: Data Replication • B: What is the LSC Data. Grid? • C: The LSCData. Find tool

A: Data Replication

A: Data Replication

General Principle Not all pipes are created equal. Neither are all storage locations.

General Principle Not all pipes are created equal. Neither are all storage locations.

Data Requirements • Catalog 108 files and their locations • What files are where

Data Requirements • Catalog 108 files and their locations • What files are where (possibly at more than one place) • Across multiple sites within a Grid • No single point of failure • No central catalog/server

Data Replication Services: Concepts • Abstract logical file name (LFN) from physical filename (PFN)

Data Replication Services: Concepts • Abstract logical file name (LFN) from physical filename (PFN) • Maintain a local replica catalog (LRC) mapping from LFNs to PFNs only for local files. • Maintain a replica location index (RLI) mapping LFNs to other sites’ LRCs for files that aren’t local.

Replica Location Service site A rls: //server. A: 39281 LRC file 1 file 2

Replica Location Service site A rls: //server. A: 39281 LRC file 1 file 2 file 1→ gsiftp: //server. A/file 1 file 2→ gsiftp: //server. A/file 2 RLI file 3→ rls: //server. B/file 3 file 4→ rls: //server. B/file 4 site B rls: //server. B: 39281 LRC file 3 file 4 file 3→ gsiftp: //server. B/file 3 file 4→ gsiftp: //server. B/file 4 RLI file 1→ rls: //server. A/file 1 file 2→ rls: //server. A/file 2

RLS: Replica Location Service • Globus RLS • Each RLS server usually runs two

RLS: Replica Location Service • Globus RLS • Each RLS server usually runs two catalogs: • LRC: Local Replica Catalog • Catalog of what files you have (LFNs) and mappings to URL(s) or PFNs • RLI: Replica Location Index • Catalog of which files (LFNs) that other LRCs in your data grid know about

A Site’s LRC • Each site has LRC with mappings of LFNs to PFNs

A Site’s LRC • Each site has LRC with mappings of LFNs to PFNs • usually contains the “local” mappings • where files are located at the site • Example: UMW might have this mapping in its LRC: H-R-792845521 -16. gwf → gsiftp: //dataserver. phys. uwm. edu/LIGO/H-R-79284552116. gwf

LRCs Inform Each Other LRC catalog at each site tells remote RLIs what LFNs

LRCs Inform Each Other LRC catalog at each site tells remote RLIs what LFNs it has mappings for. • Example: UWM tells Caltech it has a mapping for H-R-792845521 -16. gwf • So Caltech RLI has mapping H-R-792845521 -16. gwf → LRC at Milwaukee

How it Works (Under the Hood) Ask your local LRC: “Do you know about

How it Works (Under the Hood) Ask your local LRC: “Do you know about file X? ” • If yes, you can ask your local LRC for the corresponding URL (PFN). • If no, • • Ask your local RLI: “Who do I ask about X? ” It will answer, “The RLS server at Site Y. ” Ask the LRC at Site Y, “Do you know about file X? ” It will return the PFN.

SRB: Storage Request Broker • http: //www. sdsc. edu/srb/ • Distributed data management solution

SRB: Storage Request Broker • http: //www. sdsc. edu/srb/ • Distributed data management solution • Supports management, collaborative (and controlled) sharing, publication, and preservation of distributed data collections • Provides rich set of APIs available to higher-level applications • Provides a management layer on top of a wide variety of storage systems.

SRB • SRB can be thought of as a: • • Distributed file system

SRB • SRB can be thought of as a: • • Distributed file system Datagrid management system Digital Library system Semantic Web

SRB as Data Grid Management • • • Transparent replication Archiving, caching, synchs, and

SRB as Data Grid Management • • • Transparent replication Archiving, caching, synchs, and backups Heterogeneous storage Container and aggregated data movement Bulk data ingestion Third-party copy & move

LDR: Lightweight Data Replicator • http: //www. lsc-group. phys. uwm. edu/LDR • Replicates datasets

LDR: Lightweight Data Replicator • http: //www. lsc-group. phys. uwm. edu/LDR • Replicates datasets within a data grid • • High-speed data transfers with Globus Grid. FTP Globus RLS stored using a My. SQL backend Metadata stored in My. SQL backend Uses GSI for security

LDR • Collections of files to be replicated defined by LRD administrator as a

LDR • Collections of files to be replicated defined by LRD administrator as a SQL query • Priority queue for scheduling replication

B: What is the LSC Data. Grid?

B: What is the LSC Data. Grid?

What is the LSC Data. Grid? • A collection of LSC computational and storage

What is the LSC Data. Grid? • A collection of LSC computational and storage resources… • … linked through Grid middleware… • … into a uniform LSC data analysis environment.

LSC Data. Grid Sites • Tier 1: Cal. Tech • Tier 2: UWM and

LSC Data. Grid Sites • Tier 1: Cal. Tech • Tier 2: UWM and PSU • Tier 3: UT-Brownsville and Salish Kootenai College (SKC) • Linux clusters at GEO sites Birmingham, Cardiff and the Albert Einstein Institute (AEI) • LDAS instances at Caltech, MIT, PSU, and UWM

Monitoring the LSC Data. Grid http: //watchtower. phys. uwm. edu/ganglia-webfrontend/

Monitoring the LSC Data. Grid http: //watchtower. phys. uwm. edu/ganglia-webfrontend/

Lab 4: LSCData. Find

Lab 4: LSCData. Find

Lab 4: LSCData. Find • In this lab, you’ll: • • • Verify your

Lab 4: LSCData. Find • In this lab, you’ll: • • • Verify your Data. Find configuration Find observatories Find data types Find actual data (wow!) Refine a search Retrieve data you’ve found

Credits • NSF disclaimer • Portions of this presentation were adapted from the following

Credits • NSF disclaimer • Portions of this presentation were adapted from the following sources: • Gry. Phy. N Grid Summer Workshop • NEESgrid Sysadmin Workshop