Learning from 10 M files test Simone Campana
- Slides: 14
Learning from 10 M files test Simone Campana
10 M files test • Numbers – 10, 000 datasets to be distributed over 10 days • Placed roughly 1000 at each T 1 • 100 files per dataset, few MB per file – 100, 000 subscriptions in total • Each dataset must go from host T 1 to all other T 1 s (including CERN) – 10 M files transferred in total over 10 days • Metric for success: – Every site should import 1 M files in 10 days • 100, 000 files per day – After 10 days, every site should hold a complete copy of 95% of the datasets • 9500 datasets
What do we want to test • During data taking, we had to stop exports for 3 days and enter in crisis mode when we switched to containers – This means more datasets and therefore more subscriptions – Is the problem still there (we have a new DDM release) – Are there still some bottlenecks? • How many files can a T 1 collect in a day?
Running the Exercise • 2000 datasets subscribed every morning for the first 5 days • Transfers will displayed in the “activity” dashboard – http: //dashb-atlas-datasoup. cern. ch/dashboard/request. py/site – Click on the “FT” activity • Dataset completion progress monitoring: – http: //atladcops. cern. ch: 8000/mappings/10 Mtest. Status. lo g • T 1 -T 1 weekly Functional Test are stopped – T 0 -T 1 and T 1 -T 2 continue
2000000 Number of Files per Day 1800000 1600000 1400000 TRIUMF-LCG 2_DATADISK TAIWAN-LCG 2_DATADISK 1200000 SARA-MATRIX_DATADISK RAL-LCG 2_DATADISK PIC_DATADISK 1000000 NDGF-T 1_DATADISK INFN-T 1_DATADISK 800000 IN 2 P 3 -CC_DATADISK FZK-LCG 2_DATADISK 600000 CERN-PROD_DATADISK BNL-OSG 2_DATADISK 400000 200000 0 1 2 3 4 Day 5 6 7
Site Issues • FZK experienced SRM problem for the first four days – Possibly due to SRM overload from srm-ls – Happened also to LYON during new year brake and PIC yesterday – Comments? • Lyon also showed a 15% transfer inefficiency due to SRM timeouts for a couple of days • The FZK-NDGF transfers are still very problematic – Almost 100% failures FZK->NDGF, approx 70% failures NDGF->FZK – News?
DDM Site Services • Surely, the problem with “slow fetcher” (observed during data taking) is not there any longer • We observed a degrade of the system after day 2 – Investigated by Miguel. Further optimizations in My. SQL indexing and in table cleaning. Problem solved after day 3. – One extra optimization in database maintanance cycle will be included • We are really at the level of fine tuning …
Dashboard • One problem on Saturday – Table full. The “soup” dashboard is not yet running in the production oracle rack (still in test mode) – Historical info have been deleted to make more space • At this rate the dashboard is getting too many callbacks – Can hardly cope, non critical callbacks have been dropped to recover saturday’s backlog. – We will revisit the callback strategy, Ricardo has already some ideas • Drop the dataset content callback (which files belong to a given dataset)
LFC registration • LFC registration from Site Services – 1 thread per LFC – 4 calls to LFC for 1 file registration – No bulk method • LFC registration time dominated by RTT – 2 Hz for BNL, 1 Hz for TRIUMF, 0. 5 Hz for ASGC – 160 K files/day for BNL, 80 K for TRIUMF and 40 K for ASGC • Need bulk methods with “one shot registration” – Need to understand the timescale. Change at Client and Server side
File Transfer Performance: Number of file per day (from DDM) 350000 Number of Files per Day 300000 BNL-OSG 2_DATADISK 250000 CERN-PROD_DATADISK FZK-LCG 2_DATADISK 200000 IN 2 P 3 -CC_DATADISK INFN-T 1_DATADISK NDGF-T 1_DATADISK 150000 PIC_DATADISK RAL-LCG 2_DATADISK SARA-MATRIX_DATADISK 100000 TAIWAN-LCG 2_DATADISK TRIUMF-LCG 2_DATADISK 50000 0 0 1 2 3 4 Day 5 6 7 8
Average time per successful transfer (in seconds) from FTS logs Dest. ----Source CERN BNL 33 FZK 60 60 IN 2 P 3 20 33 63 INFN 17 26 55 74 NDGF 21 28 76 78 24 PIC 29 30 60 81 28 RAL 39 46 81 97 41 50 SARA 19 26 51 76 24 32 TAIWAN 62 51 104 109 56 TRIUMF 75 69 94 100 69 BNL FZK IN 2 P 3 INFN 63 40 63 NDGF PIC RAL SARA TAIWAN TRIUMF 37 45 56 35 74 111 64 58 74 45 119 158 29 39 47 25 100 139 29 43 21 77 116 26 41 75 122 24 90 111 39 91 132 45 20 89 116 59 72 57 66 85 68 45 151 109
seconds Average time per successful transfer (in seconds) from FTS logs
Dataset Completion Rate > 95% Dest. ----Source BNL FZK IN 2 P 3 INFN NDGF PIC RAL SARA TAIWAN TRIUMF CERN 50%-95% BNL FZK IN 2 P 3 INFN NDGF < 50% PIC RAL SARA TAIWAN TRIUMF
Next Steps • Need to test subscriptions from tape – Functionality now in DDM – Low rate • Need to test the new T 0 export workflow – Open datasets • Need test gridftp servers performance (throughput) – Requested by several sites • Need to repeat the 10 M files test after current limitations have been overcome • Possibly merge tests?
- Project doma
- Simone campana
- Dot powai files are binary files
- Cjis security levels
- Cjis meaning
- Campaña cuidado de manos
- Campana de gauss caracteristicas
- Diosa protectora del matrimonio
- Signo de la tienda de campaña endoscopia
- Transformacion isoentropica
- Campana piccina che attendi lassù
- Pirámide en forma de campana
- Hoy saludo muy contento al marino arturo prat
- La campaña definitiva 1820 a 1822
- Spirometro a campana