Proto DUNE SP Data Challenge 1 5 Steven
- Slides: 13
Proto. DUNE SP Data Challenge 1. 5 Steven Timm DUNE S+C 12 Jan 2018
New data management people • Welcome Jeremy Hewes (U. of Cincinnati) - Working on inserting metadata into files from the DAQ • Welcome Robert Illingworth (Fermilab) - Working on coordination with CERN FTS 3 and other things.
Data Challenge 1. 5 Intro • Show that Data Movement apparatus for SP can work at full rate (2. 5 GB/s) for one working day - 2. 5 GB/s = 20 GBit/s which is max rate of bonded network link from EHN 1=> cern building 513 • Monitor networks and systems @ CERN and Fermilab while it is ongoing. • Make at least some transferred detsim data available for keepup processing at FNAL • Scheduled: Jan 19 (with preliminary testing on Jan 18) 12 Jan 2018 Data Challenge 1. 5 S. Timm
Ideal setup: • FTS-Light @ EHN 1 running on np 04 -srv-003 • Transfer file(s) from data buffer machine np 04 -srv-001 to EOS via 3 rdparty xrdcp - These will not be files from actual detector/DAQ at this point - DAQ work not using this machine regularly as yet. (to my knowledge). - Other DAQ work should not be impacted. • FTS running in CERN Open. Stack cloud— 3 destinations - EOS (via 3 rd-party xrdcp) - Castor (via 3 rd-party xrdcp) - FNAL scratch d. Cache (via globus-url-copy) • Net: 2 writes into EOS and 3 reads for every file we transfer • Keep 25 8 GB files in flight at any given time • If that works, then try to go twice as fast. 12 Jan 2018 Data Challenge 1. 5 S. Timm
EHN-1 status currently • FTS-Light - We have login to np 04 -srv-003 - Can install and run FTS-Light in user space (Steve) - Know routing magic to get it to go straight to bldg 513 switch across dedicated link—needs to be made permanent (Geoff) • Disk buffer - We have login to np 04 -srv-001 - Working on installing xrootd server (Stu) - Disk being installed this week (Geoff Savage under direction of G. Lehmann) • Ideally need some significant disk on np 04 -srv-001 • If it’s not ready in time need enough to put one 8 GB file and then clone it. 12 Jan 2018 Data Challenge 1. 5 S. Timm
Other preparation: • Re-installing FTS server in Open. Stack VM as part of DUNE project (Steve) • New version of FTS and underlying utilities will fix several bugs observed previously. (Thanks Robert’s group) • Some quantity of non-zero-suppressed “detsim” level Proto. DUNE MC will be sent to Fermilab as part of this to feed DC 1. 6. 12 Jan 2018 Data Challenge 1. 5 S. Timm
P 3 S / Tier 0 stress test • Optional if time permits • 1 K P 3 S jobs on Tier 0 reading files by normal access methods - (Maxim Potekhin not available on 19 th, this will have to wait) • 1 K test jobs via jobsub that each open a (different) file via xrootd and stream it, doing no processing, just reading. • Run other local Tier 0 jobs (beamline, protodune DRA) • Not sure what aggregate rate of this will be but we should try it because in practice both of these will be going at once. • Once we have this working, combine with data management stress test of previous step. 12 Jan 2018 Data Challenge 1. 5 S. Timm
Schedule Jan 18: • 1 -2 hour test (1400 CERN/0700 FNAL) • Geoff Savage / Xavier Espinal at EHN 1 watching network throughput there • Test first iperf (bytes across network only) • Then transfer a few files across xrdcp • Then try one 25 -wide transfer with FTS-Light to EOS 12 Jan 2018 Data Challenge 1. 5 S. Timm
Schedule Jan. 19 • Start during CERN business hours where Xavier Espinal or his designate can monitor the CERN side closely for undue stress. • Start with 25 simultaneous transfers, make sure all on CERN side can handle the load. • Ramp up to 50 simultaneous transfers, confirm again EOS/Castor and networking are good • Observe rate to FNAL, tune relative number of simultaneous transfers as necessary • Keep going through end of FNAL business hours. 12 Jan 2018 Data Challenge 1. 5 S. Timm
Key Metrics: • 20 GBit/s (2. 5 GByte/s) network bandwidth achieved EHN 1>EOS (via FTS-Light) - This number comes from the network rate limitation • Successful FTS processing (3 reads and 2 writes) of all those files without knocking EOS, Castor, or network over. • Successful catching of one copy of the files at Fermilab without knocking d. Cache over. • Ultimate goal is to sustain 2. 5 GByte/s through the whole data chain. - That’s a factor of 10 greater rate than DUNE has done transatlantic until now - Also 50% of the average d. Cache ingest rate. - CMS does more than that regularly 12 Jan 2018 Data Challenge 1. 5 S. Timm
Next steps • Schedule short meeting in the next week to discuss further work • Dune data management meeting, Mon 22 Jan 2018 15: 00 - Unless someone has a better time slot. • Present at collaboration meeting. 12 Jan 2018 Data Challenge 1. 5 S. Timm
12 Jan 2018 Data Challenge 1. 5 S. Timm
12 Jan 2018 Data Challenge 1. 5 S. Timm