SC 3 experiences Ron Trompert SARA SC 3

  • Slides: 16
Download presentation
SC 3 experiences Ron Trompert SARA

SC 3 experiences Ron Trompert SARA

SC 3 Infrastructure Starting point DMF-based HSM DMF has no SRM implementation DMF does

SC 3 Infrastructure Starting point DMF-based HSM DMF has no SRM implementation DMF does not support functionality promised by the SRM standard, like file pinning.

SC 3 Infrastructure d. Cache provides an srm I/F d. Cache provides flexibility with

SC 3 Infrastructure d. Cache provides an srm I/F d. Cache provides flexibility with respect to HSM backends If we need to switch to another HSM setup for some reason

SC 3 Infrastructure: throughput phase

SC 3 Infrastructure: throughput phase

SC 3 Throughput phase Disk 2 disk: 100 -110 MB/s Problems with stability of

SC 3 Throughput phase Disk 2 disk: 100 -110 MB/s Problems with stability of the nodes: solved by limiting the number of I/O movers Disk 2 tape: 50 MB/s Not enough bandwidth, SAN not dedicated

SC 3 Infrastructure: service phase

SC 3 Infrastructure: service phase

SC 3 service phase statistics Percentage of computational resources used (october-december) LHCb ATLAS SARA

SC 3 service phase statistics Percentage of computational resources used (october-december) LHCb ATLAS SARA 28 0 NIKHEF 21 39

SC 3 service phase statistics LHCb ATLAS GBs in 7638 881 GBs out 5

SC 3 service phase statistics LHCb ATLAS GBs in 7638 881 GBs out 5 0 GB stored 3334 900

SC 3 service phase statistics Setting up the infrastructure took longer than we had

SC 3 service phase statistics Setting up the infrastructure took longer than we had hoped so unfortunately we missed ALICE. Sizes and number of files transferred to srm SE LHCb ATLAS 188 MB 211 MB # inbound transfers 41508 4277 #inbound transfers files size < 100 MB 5013 3526 # inbound transfers file size < 1 MB 4922 3261 Average file size

SC 3 service phase observations Networking problems Hardware problems 10 GE to CERN was

SC 3 service phase observations Networking problems Hardware problems 10 GE to CERN was dedicated but the 10 G switch not. Switching back and forth between dedicated 10 GE and Geant. Routing problems Considerably less data stored for Atlas than expected. In plans on Wiki 20 TB

SC 3 service phase observations Communication problem Network changes not reported 4 We were

SC 3 service phase observations Communication problem Network changes not reported 4 We were not informed of changes in subnets. Problems are not always reported 4 Failed transfers are not always reported 4 Network outage CERN-SARA between Xmas and New Year, nobody informed us Monitoring: experiment monitoring websites in Wiki but also found other monitoring website urls in emails. Not clear what the experiments exact plans are 4 When there are no transfers and no problems are reported, it is not clear whethere is something wrong or things go just as planned.

SC 3 service phase observations Failed transfers by attempting to overwrite files Not allowed

SC 3 service phase observations Failed transfers by attempting to overwrite files Not allowed by PNFS At d. Cache sites running a gridftp door on there srm node files can be thrown away immediately using edg-gridftp-rm or glite-gridftp-rm At d. Cache sites that don’t run a gridftp door on the srm node an advisory delete can be done. But then files are not immediately deleted.

SC 3 service phase observations d. Cache security (gsi)dcap Using dccp it is possible

SC 3 service phase observations d. Cache security (gsi)dcap Using dccp it is possible to get anything in /pnfs/grid. sara. nl/data/<vo> by anyone Unix permissions on directories are not honoured 4 Files in a directory with –rwxr-x--- are world readable. File permission are honoured but when data is copied in /pnfs it gets –rw-r--r--. Using gsidcap you are authenticated but the behaviour above stays the same. Write permissions are OK. Maybe this is OK for HEP VOs but for some VOs this is too liberal.

SC 3 service phase observations Oracle database Every now and then it just hangs

SC 3 service phase observations Oracle database Every now and then it just hangs and needs to be restarted. Backups didn’t work but FTS and LFC did.

SC 3 service phase observations A user wanted to run a job using root

SC 3 service phase observations A user wanted to run a job using root I/O which is rfio/dcap based. Rfio/dcap are unauthenticated protocols to access data Rfio comes automatically when installing a classic SE with yaim. We don’t really like it but what do the other T 1 s think about this?

SC 4 Outlook Current plans (being updated) -Setup T 2 tests -Separate T 1

SC 4 Outlook Current plans (being updated) -Setup T 2 tests -Separate T 1 tape storage from general storage -Replace old SE by SRM SE -Setup DB node for FTS/LFC