First test of the Po C Caveats I
- Slides: 20
First test of the Po. C
Caveats • I am not a developer ; ) • I was also beta tester of Crab 3+WMA in 2011; I restarted testing it ~2 weeks ago to have a 1 to 1 comparison • The first 2 weeks of the Po. C test were mainly – Finding a problem – Communicating the developers – Getting a new version – Trying again – I simply skip this part, which is ok; I speak about the results after all the fixes
What I tested (with both) • A complicated workflow: the official (V)H->bb analysis step 1 (see https: //twiki. cern. ch/twiki/bin/view/CMS/VHbb. A nalysis. New. Code#Ntuple. V 42_CMSSW_5_3_3_pat ch 2 ) which takes ~2 hours just to compile – Indeed ISB ~ 45 MB, with 56 user compiled libraries • Running on dataset /Double. Electron/Run 2012 BPrompt. Reco-v 1/AOD – 40 LS/job -> ~ 1200 jobs, a couple of hours each
Where I tested • CRAB 3/Panda: test is restricted to few sites (FNAL, Pisa, DESY, …) – The sample is indeed just in FNAL and Pisa among the Po. C sites • CRAB 3/WMA: 8 T 2 s available, some of poor quality (T 2_RU_*) • Always used Pisa as storage site
Moreover • Po. C is not expected to provide full Crab 3 functionality, just (as in the email I got) – Submit – Resubmit – Kill – Status – Getoutput – Getlog So I stick to these also for Crab 3/WMA (i. e. I do not do DBS publication)
Panda • • • • • • Configs from WMCore. Configuration import os from datetime import datetime config = Configuration() config. section_("General") config. General. server. Url = 'poc 3 test. cern. ch’ config. General. ufccache. Url = 'cmsweb-testbed. cern. ch’ config. section_("Job. Type") config. Job. Type. plugin. Name = 'Analysis' config. Job. Type. pset. Name = 'pat. Data. py’ config. section_("Data") config. Data. input. Dataset = '/Double. Electron/Run 2012 BPrompt. Reco-v 1/AOD' config. Data. publish. Data. Name = os. path. basename(os. path. abspath('. ')) +"_tom" config. Data. lumi. Mask = 'Lumi. json’ config. Data. publish. Dbs. Url = "https: //cmsdbsprod. cern. ch: 8443/cms_dbs_ph_analysis_02 _writer/servlet/DBSServlet" config. Data. splitting = 'Lumi. Based' config. Data. units. Per. Job = 40 config. section_("User") config. User. email = ’’ config. section_("Site") config. Site. storage. Site = 'T 2_IT_Pisa' WMA • • from WMCore. Configuration import os config = Configuration() config. section_("General") config. General. request. Name = 'request_name 2' config. General. server. Url = 'crab 3 -test. cern. ch' config. General. ufccache. Url = 'cmsweb. cern. ch' • • • config. section_("Job. Type") config. Job. Type. plugin. Name = 'Analysis' config. Job. Type. pset. Name = 'pat. Data. py' • • config. section_("Data") config. Data. input. Dataset = '/Double. Electron/Run 2012 BPrompt. Reco-v 1/AOD’ config. Data. splitting = 'Lumi. Based' config. Data. units. Per. Job = 40 config. Data. lumi. Mask = 'Lumi. json’ config. section_("User") config. User. email = ’’ config. section_("Site") config. Site. storage. Site = 'T 2_IT_Pisa' • •
Soon after submit bash-3. 2$ crab status -t crab_20121127_113729 -i Registering user credentials Task name: tboccali_crab_20121127_113729_121127_103859 Panda url: http: //panda. cern. ch/server/pandamon/query? job=*&jobset. ID=19 &user=Tommaso%20 Boccali Details: running 0. 78 % (10/1279) activated 99. 22 % (1269/1279) Information per site are not available. Log file is /afs/cern. ch/work/b/boccalio/Po. C/CMSSW_5_3_3_patch 2/src/VHb b. Analysis/Hbb. Analyzer/test/Po. CTests/crab_20121127_113729/cra b. log No information per site, link to monitoring present bash-3. 2$ crab status -t crab_request_name 2 -i Registering user credentials Task Status: running Using 7 site(s): Jobs Details: submitted 100. 00 % ( running 44. 31 % pending 55. 69 % ) T 2_US_Florida: submitted 14. 58 % T 2_FR_GRIF_IRFU: submitted 14. 58 % T 2_RU_JINR: submitted 14. 58 % T 2_UK_London_IC: submitted 12. 54 % T 2_FR_GRIF_LLR: submitted 14. 58 % T 2_IT_Pisa: submitted 14. 58 % T 2_ES_IFCA: submitted 14. 58 % Log file is /afs/cern. ch/work/b/boccalio/Po. C/CMSSW_5_3_3_patch 2/src/VHb b. Analysis/Hbb. Analyzer/test/Crab 3 Tests/crab_request_name 2/crab. log (no link to dashboard? ) – one has to find by hand
Few Considerations • Let’s start from the obvious: with both systems I reached 100% done, with some “resubmit” (site problems) • Feature: with Panda a resubmit is a second task (with a second web page)… Not used to it but not a critical issue (you need just to get used to it)
ASO • It worked flawlessly in both cases • Nothing more to say I guess … • (I did not even need to look into the ASO monitoring) • You can get the files before ASO operated (I guess lcg-cp is used, …)
Issues with Panda • Kill did not work for me; I understood it was simple timeout to be set to a different threshold, did not check more
Is resubmit working fine? • In both cases, it was for me • Caveat: the Po. C enabled sites are generally good/very good. No chance to test a massive failure scenario
Let’s go straight to the point • Up to here executive summary could be: • “Limiting the scenario to what the Po. C is supposed to allow me to do, PANDA performs at least as well as WMA” • (again, this _after_ the two weeks of initial testing)
What is different • Panda Monitoring seems by far better than what we are used to
Dashboard/WMA… (as usual)
…Plus WMStats Some debugging info added, but not that much (where is the WN name? where is the LSF id? )
Features we usually do not have • All the log (pilots + stderr + stdout) are on the web – All: not only snippets for failed jobs – I guess ph support would love it, instead of asking to upload logs – support can get all the info from WEB, no need to ask the (maybe not too skilled user) – Snippets are not ok in general: a failure can be dependent from a bad Env Variable … cannot be seen from the snippet alone • There is link PILOT <-> LSF id ! • This I considered lost since we left g. Lite, and it is a MAJOR help to debug strange problems (like WNs acting as black holes)
Pilot log WN LSF id
logs Full logs uploaded to SE (full logs present, not just snippets guessed as interesting by the system)
Other features I liked • Panda seems user friendly when scheduling jobs: if you submit a task, even if your priority is very low, a few jobs are executed almost immediately, allowing you to spot broken workflows in advance • It seems I can resubmit at any time (no need to wait for task in cooloff …) – Is it because ACDC is not in the game? Is there anything we pay for this (side effects I am not aware of? )
Conclusions? • As said, functionally both were doing what asked – PANDA does not look at all behind • I cannot speak about what is NOT supposed to be in Po. C (which is not a small subset) • The major differences to me are – Monitoring: way better in Po. C with full disclosure of all the info – The early prioritization of some jobs is a lot of help (goes far beyond simple python sanity check) – You seem to be able to resubmit any time – no cool off needed; this potentially cuts the time to process tails
- Hát kết hợp bộ gõ cơ thể
- Bổ thể
- Tỉ lệ cơ thể trẻ em
- Chó sói
- Tư thế worms-breton
- Chúa sống lại
- Môn thể thao bắt đầu bằng từ đua
- Thế nào là hệ số cao nhất
- Các châu lục và đại dương trên thế giới
- Công thức tiính động năng
- Trời xanh đây là của chúng ta thể thơ
- Mật thư anh em như thể tay chân
- Làm thế nào để 102-1=99
- Phản ứng thế ankan
- Các châu lục và đại dương trên thế giới
- Thơ thất ngôn tứ tuyệt đường luật
- Quá trình desamine hóa có thể tạo ra
- Một số thể thơ truyền thống
- Cái miệng xinh xinh thế chỉ nói điều hay thôi
- Vẽ hình chiếu vuông góc của vật thể sau