Cross site support Why cross site support 1072020

  • Slides: 15
Download presentation
Cross site support ?

Cross site support ?

Why cross site support 10/7/2020 Talk Title

Why cross site support 10/7/2020 Talk Title

Why cross site support 10/7/2020 Talk Title

Why cross site support 10/7/2020 Talk Title

SFT failures per site from 01/jan-30/june • Data from the CIC portal https: //egee.

SFT failures per site from 01/jan-30/june • Data from the CIC portal https: //egee. in 2 p 3. fr/CIC/index. php? id=cic&subid=cic_roc_metrics&search=1 10/7/2020 Talk Title

SFT failures • sft-rgma: checks that the rgma client code works • Sft-caver: checks

SFT failures • sft-rgma: checks that the rgma client code works • Sft-caver: checks the ca version • Sft-job: check that the job went through with a done state and the job output can be retreived • Sft-lcgrm: well known replication test 10/7/2020 Talk Title

Per site view as a function of months 10/7/2020 Talk Title

Per site view as a function of months 10/7/2020 Talk Title

IC-LESC 10/7/2020 Talk Title

IC-LESC 10/7/2020 Talk Title

RHUL 10/7/2020 Talk Title

RHUL 10/7/2020 Talk Title

UCL-HEP 10/7/2020 Talk Title

UCL-HEP 10/7/2020 Talk Title

UCL-CENTRAL 10/7/2020 Talk Title

UCL-CENTRAL 10/7/2020 Talk Title

QMUL 10/7/2020 Talk Title

QMUL 10/7/2020 Talk Title

Conclusion • Most of the failures are replication failures and job failures. – Job

Conclusion • Most of the failures are replication failures and job failures. – Job Failures • What causes them (BDII/MDS broken, disk space) • How do you solve them – Replication failures • What causes them • How do you solve them – My experience is that we frequently • Look in the log files • Clean directories that are full • Restart the services – The key here is to understand what is happening and report it to the developers. That’s a long term target. – In the short term we can help each other – Give your input 10/7/2020 Talk Title

Proposal, give ltwosgm sudo privileges to allow starting/stopping services • Use the ltwosgm account

Proposal, give ltwosgm sudo privileges to allow starting/stopping services • Use the ltwosgm account to give sudo privileges to start/stop daemons and look in the /var/log dir • Example below User_Alias GRIDADMIN = ltwosgm Cmnd_Alias GENERALCMD = /usr/bin/tail Cmnd_Alias CECMD = /etc/init. d/bdii, /etc/init. d/globus-gatekeeper, /etc/init. d/globus-mds Cmnd_Alias MONCMD = /etc/init. d/globus-mds, /bin/ps Host_Alias CE = gw 39. hep. ph. ic. ac. uk Host_Alias MON = gw 35. hep. ph. ic. ac. uk GRIDADMIN CE = NOPASSWD: GENERALCMD, CECMD GRIDADMIN MON = NOPASSWD: GENERALCMD, MONCMD 10/7/2020 Talk Title

Avoiding clashes • When site admin is available it sets it on a web

Avoiding clashes • When site admin is available it sets it on a web page • If the local admin is not available the next person on the available list can take action • He takes the token and the other see that the token is taken hence we avoid clashes 10/7/2020 Talk Title

What are the site policies ? • UCL-HEP – ? • UCL-CENTRAL – ?

What are the site policies ? • UCL-HEP – ? • UCL-CENTRAL – ? • IC-HEP – ? • IC-LESC – ? • QMUL – ? • BRUNEL (Answer from Duncan) – Paul and I discussed the issue of external access to our site and agreed that it was not something we were very keen on. We are happy for you to have access but not sure about others, we feel that the more people who have access the higher the possibility of mistakes being made. • RHUL 10/7/2020 Talk Title