T 2 FRcloud report Frdric Derue LPNHE Paris

  • Slides: 15
Download presentation
T 2 FR-cloud report Frédéric Derue, LPNHE Paris Calcul ATLAS Franceth(CAF) meeting CC-IN 2

T 2 FR-cloud report Frédéric Derue, LPNHE Paris Calcul ATLAS Franceth(CAF) meeting CC-IN 2 P 3 Lyon, 17 June 2019 Calcul ATLAS France (CAF) meeting, T 2 FR-cloud report, 17 th June 2019 1

CPU by processing cloud link All T 2 s Now 28. 2% 20. 3%

CPU by processing cloud link All T 2 s Now 28. 2% 20. 3% 16. 2% 15. 3% 5. 4% 3. 9% 3. 8% 3. 3% 2. 3% 1. 0% 0. 4% Calcul ATLAS France (CAF) meeting, T 2 FR-cloud report, 17 th June 2019 Previously 26. 0% 22. 5% 15. 2% 5. 4% 4. 8% 4. 0% 2. 9% 2. 2% 1. 1 % 0. 4% 2

CPU by activity All T 2 s in FR cloud link Now 23. 8%

CPU by activity All T 2 s in FR cloud link Now 23. 8% 17. 5% 16. 9% 14. 3% 13. 6% 9. 6% 3. 0% 1. 2% Previously 21. 9% 23. 8% 11. 8% 16. 1% 5. 9% 12. 4% 4. 1% 3. 8% Now 36. 3% 14. 2% 13. 3% 12. 2% 11. 2% 7. 9% 2. 4% 2. 3% Previously 32. 8% 4. 7% 18. 1 % 8. 1% 9. 2% 4. 7% 4. 3 % more Event. Gen and MC Sim. Fast more MC Sim. Fast than in previous period Less Full. Sim than in « All T 2 s » Calcul ATLAS France (CAF) meeting, T 2 FR-cloud report, 17 th June 2019 3

CPU by number of cores All T 2 s in FR cloud link Now

CPU by number of cores All T 2 s in FR cloud link Now 60. 8% 32. 1% Previously 64. 0% 29. 7% Now 73. 7% 23. 4% Previously 74. 7% 22. 9% Similar usage of cpu cores in this period as in previous one Calcul ATLAS France (CAF) meeting, T 2 FR-cloud report, 17 th June 2019 4

CPU in FR cloud In French sites by country link Now 25. 5% 18.

CPU in FR cloud In French sites by country link Now 25. 5% 18. 6% 14. 8% 13. 2% 12. 4% 8. 5% 7. 1% Previously 15. 5% 18. 0% 14. 0% 17. 3% 13. 3% 12. 0% 9. 9% In Romanian sites Now 45. 9% 43. 3% 7. 7% 3. 2% Previously 44. 1% 48. 6% 4. 5% 2. 8% Calcul ATLAS France (CAF) meeting, T 2 FR-cloud report, 17 th June 2019 link Now Previously 74. 0% 58. 1% 16. 0% 4. 4% 7. 5% 7. 0% 2. 5% 30. 6% 5

link Storage in FR cloud DATA+SCRATCHDISK Calcul ATLAS France (CAF) meeting, T 2 FR-cloud

link Storage in FR cloud DATA+SCRATCHDISK Calcul ATLAS France (CAF) meeting, T 2 FR-cloud report, 17 th June 2019 link LOCALGROUPDISK 6

Site Avaibility (SAM) tests link Calcul ATLAS France (CAF) meeting, T 2 FR-cloud report,

Site Avaibility (SAM) tests link Calcul ATLAS France (CAF) meeting, T 2 FR-cloud report, 17 th June 2019 7

ATLAS Site Avaibility and Performance (ASAP) link Calcul ATLAS France (CAF) meeting, T 2

ATLAS Site Avaibility and Performance (ASAP) link Calcul ATLAS France (CAF) meeting, T 2 FR-cloud report, 17 th June 2019 8

Transfers matrix efficiency D e s t i n a t i o n

Transfers matrix efficiency D e s t i n a t i o n Source D e s t i n a t i o Calcul ATLAS France (CAF) meeting, T 2 FR-cloud report, 17 th June 2019 n link 9

Cent. OS 7 migration / singularity ● Cent. OS 7 migration Deadline from ATLAS

Cent. OS 7 migration / singularity ● Cent. OS 7 migration Deadline from ATLAS June 1 st : all sites - since last CAF : All Romania x 4, LPNHE, LPSC, LPC ● Singularity → see last discussion at LCG-FR-Tech 17 th May [indico] → different deadline than for Cent. OS 7 migration → most of sites have issues → can be tested by simple jobs, e. g prun --container. Image docker: //alpine --exec "echo 'Hello World!'" --out. DS user. derue. test. $(date +%Y%m%d%H%M%S) --no. Build --site ANALY_IN 2 P 3 -CC_CL 7 ⇒ but look at stderr. txt Calcul ATLAS France (CAF) meeting, T 2 FR-cloud report, 17 th June 2019 10

IPv 6 deployment ● IPv 6 deployment [wiki] Deadline for T 2 s was

IPv 6 deployment ● IPv 6 deployment [wiki] Deadline for T 2 s was 31/12/2018 FR, Beijing, Tokyo, RO-02 sites are done RO-07 since 23 rd May Calcul ATLAS France (CAF) meeting, T 2 FR-cloud report, 17 th June 2019 11

DOME migration The new core of DPM - Disk Operations Management Engine (DOME) enables

DOME migration The new core of DPM - Disk Operations Management Engine (DOME) enables Storage Resource Reporting (SRR) publishing both for storage description and accounting information and runs much smoother for HTTP, xrootd, gridftp Status of sites can be checked on [this wiki] + see [minutes] of last LCGFR-Tech France : ● 2 pilot sites (v 1. 12 mid April) → LAL, IRFU → meeting of dedicated task force ● to be done : → LPC (ongoing, done for AUVERGRID) → CPPM (done but with 1 st DOME version of 2017) → LAPP, LPNHE, LPSC Calcul ATLAS France (CAF) meeting, T 2 FR-cloud report, 17 th June 2019 China : ● to be done : → Beijing, Hong-Kong Japan : ● done : → Tokyo Romania : ● to be done : → RO-02, RO-07 12

DPM see [minutes] of last LCGFR-Tech ATLAS has reported slow file deletions on DPM

DPM see [minutes] of last LCGFR-Tech ATLAS has reported slow file deletions on DPM sites – using http/webdav; ATLAS is the only VO which is using this protocol for deletion. DPM legacy et DOME sites are affected as well • Edith (CPPM) and Jean-Claude (LPC) restart http to solve this pb. • Guillaume (LAL) does not have such issuesn’a pas de soucis • Victor (LPNHE) had identified such a problem : the renewal of CRL was not done correctly and was blocking connexions. He put a cron which restarts http once a week – and sent email to DPM developpes Calcul ATLAS France (CAF) meeting, T 2 FR-cloud report, 17 th June 2019 13

Sites issues and GGUS tickets ● List of [tickets NGI_France] since beginning of April

Sites issues and GGUS tickets ● List of [tickets NGI_France] since beginning of April ● 37 tickets opened or closed for NGI_FRANCE (not counting CC) mostly for transfer and deletion errors → CPPM (5) : deletion/transfer (4), jobs failing (1) [CE decomissioned] → GRIF : Unknown site (1), lost heartbeat (1) IRFU (2), deletion (2) LAL (8), transfer/deletion (8) LPNHE (1), jobs failing (1) [downtime] → LAPP (6), deletion error (4), squid (1), jobs in queue exceeds the queue limit (1) → LPC (5), deletion error (5) → LPSC (6), deletion/transfer (3), jobs/batch pb (2), jobs in queue exceeds the queue limit (1) Calcul ATLAS France (CAF) meeting, T 2 FR-cloud report, 17 th June 2019 14

Sites issues and GGUS tickets ● List of tickets since beginning of April [tickets

Sites issues and GGUS tickets ● List of tickets since beginning of April [tickets NGI_RO], [ticket NGI_CHINA], [tickets Tokyo] ● 16 tickets for NGI_RO → RO-2 (2) : degraded (1) RO-02 is off since several weeks dedicated ticket [ticket], removal of SE from rucio [ADCINFR-122] in ADC weekly 13 Feb [link] Cent. OS 7 upgrade (1) → RO-7 (12) : cvmfs not found (1), deletion/transfers (7), xrootd (1), jobs pb (3) → RO-16 (2) : squid down (2) ● 6 tickets for NGI_CHINA → BEIJING (3) : deletion errors (2), no pilot (1) → HK (2) : deletetion error / Unspecified Grid. Manager error → USTC-T 3 (1) : disk quota in USTC-T 3_LOCALGROUPDISK ● 1 ticket for Tokyo : failing jobs / Panda queue no more in test ● But also many discussions not through tickets but using mails. . Calcul ATLAS France (CAF) meeting, T 2 FR-cloud report, 17 th June 2019 15