SRB in CMS Use of SRB in CMS
SRB in CMS Use of SRB in CMS 9/9/2020 T. Wildish / Princeton 1
SRB in CMS This talk… • Why SRB in CMS? • What is SRB? • Current status (material blatantly plagiarised from Ian Fisk and Simon Metson) 9/9/2020 T. Wildish / Princeton 2
SRB in CMS Why SRB? • SRB chosen by the CMS Production team for the PCP – ‘Pre-Challenge Production’, producing data for DC 04 • Why not grid tools? – Nothing available CMS-wide on the timescale we need it • Now until the end of this year – Support issues, stability • need a stable and supported product for 6 months continuously – Simplicity. We don’t need a full-blown RLS for the PCP • Simple file-catalogue will suffice • File-movement is one-time transfer, nor replication on demand – PCP is not a grid exercise! • Must succeed or there is no DC. • SRB is not a de-facto choice for the DC itself – Expect LCG to provide a solution 9/9/2020 T. Wildish / Princeton 3
SRB in CMS What is SRB? • SRB is the ‘Storage Resource Broker’ from SDSC (http: //www. sdsc. edu/DICE/SRB/index. html) – First released in 1996 • Claims to be… – A distributed file system • Hierarchical folders, file manipulation, ownership and access control – A data grid management system • Replication, containers, archiving, synchronisation, third-party transfers… – A digital library • Supports arbitrary & complex metadata associated with files – A semantic web • A what…? 9/9/2020 T. Wildish / Princeton 4
SRB in CMS How SRB Works SRB Clients: C API, Unix Command line Web Portal, Java Portal MCAT Database SRB Server Storage Element 9/9/2020 SRB Server Storage Element Tape Silo T. Wildish / Princeton SRB Server Grid FTP Tape Silo Storage Element 5
SRB in CMS Database Interactions • SRB access is centered around interactions with the MCAT database – Controls file locations, replicas, synchronization, and file access • Lot of different kinds of file access – Files can be grouped together to form collections, collections can be replicated, synchronized, deleted like regular files – Database has been implemented in both ORACLE and Post. GRES – Very good Meta. Data handling capabilities • Possible to select data files based on meta data queries • Gives a truly global file catalog. Maintains consistency between replicas • Also provides a single point of failure – When the database is down, it isn’t possible to access even local files through SRB • Version of the code with multiple synchronizing databases is expected soon, not sure if we will bother to deploy it 9/9/2020 T. Wildish / Princeton 6
SRB in CMS SRB Server Interactions • SRB Server is the server software connected to storage SRB Server – Storage can be a tape silo, a RAID array, a disk, or a grid-ftp interface to another storage device • Files are generally written to an SRBVault Grid FTP Tape Silo – Access to files in the vault are controlled only through the SRB server and the MCAT database – This prevents files from being modified outside control of SRB Tape Silo Storage Element • Also possible to register and control files into SRB that physically reside in the local file system – It is possible to skew files and replicas • SRB Servers communicate with each other over parallel streams – TCP window size is settable like grid-ftp 9/9/2020 T. Wildish / Princeton 7
SRB in CMS SRB Client Interactions • Files in SRB can be accessed in a variety of ways • Files managed in SRB can be accessed by a C API compiled into an application – Files in the SRB Vault can be accessed through the server directly • UNIX command line interface is similar to standard commands with an “S” in front – Sput, Sget, Sls –l, Srm –f, Sreplicate • Web portal that allows files to be controlled, replicated, uploaded to SRB, etc. • Java GUI that allows files to be pulled and pushed to and from SRB • Windows client software 9/9/2020 T. Wildish / Princeton 8
SRB in CMS Results of Transfer Tests • • Size and Performance How long does it take to register and replicate 500 GB between to widely separated locations? – Tested 200 GB which took approximately 6 hours, almost completely network limited • How long does it take to register and replicate 50 k 10 k. B files? – SRB has a bulk file registration mode which they have clocked at 400 files per second. I registered and replicated 1000 files in a few seconds • What is the maximum sustainable transfer rate out of a single server and what is the maximum rate a server can accept data from three servers? – About 80 -90% of network speed for 5 streams x number of servers • How many files can be registered in a day? – No inherent limits • Does the file organization matter? – No, files can be registered and replicated with the –r option like a UNIX file system • How many parallel streams can a server accept? – Unknown, very small load on CPU with 10 streams 9/9/2020 T. Wildish / Princeton 9
SRB in CMS Functionality Can groups of files be given a container name and replicated together? • • Can replicas of single files or containers be synchronized if it is necessary to change the master? • Can I declare replicas of a given file to be read-only? The master too? • Can a local (non-managed) replica be created? • Can the progress of transfers be monitored and completion time predicted? – Yes, for all the above • Are wild cards and searching supported? • Can files be queried by logical file names and selected for transfer? – In Metadata, yes to both the above • What is the interface to a MSS? – HPSS interface already existed, Grid-FTP interface soon (already? ), Castor & Enstore support added by M. Ernst, RAL tape-store interface exists too • If I delete the master of a file from the replica catalogue, what happens to its replicas? (what happens to the tape copies? ) – Replicas stay, but one can ask that replicas be deleted also 9/9/2020 T. Wildish / Princeton 10
SRB in CMS Usability • How long does it take to establish a new remote site? – Less than a few hours • Can the replication process be scripted? – from within a private network (batch farm)? • Yes with command line – what local software do I need to use it? • SRB Client Installation (RPM exists) • How good are the error messages? – Reasonable, there is an Serror function that decodes some cryptic numerical errors • How intuitive is the interface? – Unix interface, Java and Web portal very intuitive – We actually use the command-line exclusively, scripting the process entirely • Can I cancel a transfer or set of transfers, either active or scheduled? – Yes 9/9/2020 T. Wildish / Princeton 11
SRB in CMS Outlook • A few issues to investigate in terms of scale of a single database server, the system is also seen as somewhat monolithic – Distributed MCat available soon (already? ) – ‘single-point-of-failure’ can also be a single-point of strength • RAL T 1 hosting our MCat for this year, lots of expertise etc • Committed manpower & hardware for this role • SRB has a good core of developers, good support, a reasonable user base and it works as advertised. – The development team have been helpful and supportive 9/9/2020 T. Wildish / Princeton 12
SRB in CMS File organisation • Simple file organisation: – /home/CMS/PCP/$RC/$Assignment. ID/ data – /home/CMS/PCP/$RC/$Assignment. ID/logs • ‘$RC’ helps reduce human errors • ‘$Assignment. ID’ not dataset-name because all access will be via scripts anyway. – Simple tools written already, very easy to script 9/9/2020 T. Wildish / Princeton 13
SRB in CMS Current status • ~24 servers so far, ~14 sites – CERN, France, Germany, Italy, Pakistan, Russia, Spain, UK, USA • ~25, 000 files registered so far – Mostly small CMKIN ntuples • No performance figures to show – Not yet transferring bulk data, only small transfers to date • More info: – Site-status link • http: //www. cern. ch/bristol-escience/srb_sites. html – Installation-guide link • 9/9/2020 http: //project-bristol-cms-grid. web. cern. ch/project-bristol-cms-grid/srb. html T. Wildish / Princeton 14
SRB in CMS Summary • SRB installed CMS-wide for the PCP – Easy to install and configure – Easy to interface to MSS systems – Easy to script and use in our production tools • Support is satisfactory – Dedicated MCat support from RAL – SRB expertise in CMS from Simon Metson, Ian Fisk, and Michael Ernst – SRB team responsive and helpful • We believe SRB satisfies our data-management needs for the PCP 9/9/2020 T. Wildish / Princeton 15
- Slides: 15