Ever tried Ever failed No matter Try Again

  • Slides: 31
Download presentation
+ Ever tried. Ever failed. No matter. Try Again. Fail again. Fail better. (S.

+ Ever tried. Ever failed. No matter. Try Again. Fail again. Fail better. (S. Beckett) CPass 0/CPass 1 on LHC 12 e/d/c Updated at 09: 00 on 16/08 C. Zampolli

+ LHC 12 e 8/10/16 C. Zampolli 2

+ LHC 12 e 8/10/16 C. Zampolli 2

+ Summary table – on 16/08 at ~ 09: 00 3 LHC 12 e

+ Summary table – on 16/08 at ~ 09: 00 3 LHC 12 e n 27 in logbook n n n Filters used: LHC 12 e, PHYSICS, Good Run, GRP ok at least one of [SDD, TPC, TRD, TOF, T 0] CPass 0: n Snapshot: 27 (e. g. 186589 missing wrt the logbook, taken last night, but on the way) n Reco+Calib. Train: 27 n Merging+OCDB: 27, 21 useful, 11 ok CPass 1: n Snapshot: 11 n Reco+Calib. Train: 11 n Merging+OCDB: 11 C. Zampolli 8/10/16

+ Summary table – on 16/08 at ~ 09: 00 4 CPass 0 –

+ Summary table – on 16/08 at ~ 09: 00 4 CPass 0 – LHC 12 e n COSMICS: 0 failure expected n EMCAL/PHOS/MUON: 6 failure expected n No triggers: 0 failure expected (too short run) n EE/EV/Expired: 0 memory issue during the merging (under investigation) n Running: 0 n Others (detectors): 10 n Successful: 11 n 11/(11+10) = 52. 4% success rate C. Zampolli 8/10/16

+ Summary table – on 16/08 at ~ 09: 00 5 CPass 0 –

+ Summary table – on 16/08 at ~ 09: 00 5 CPass 0 – LHC 12 e Failure reason Run Number 186428 TRD + T 0 (1) 186600 Failure reason Run Number 186429 186453 TRD (8) 186456 186459 T 0 (1) 186601 186507 186508 186598 § TRD has problems with retrieving the info necessary for calibration the need version + subversion from an entry not used in the reconstruction, so not added to the User. Info § T 0 suffers from high background, but limits will be increased § The runs seem not recoverable C. Zampolli 8/10/16

+ Summary table – on 16/08 at ~ 09: 00 6 CPass 0 –

+ Summary table – on 16/08 at ~ 09: 00 6 CPass 0 – LHC 12 e Failure reason Run Number 186383 186405 EMCAL/MUON/P HOS runs (6) 186425 186448 186503 186589 C. Zampolli 8/10/16

+ Summary table – on 16/08 at ~ 09: 00 7 CPass 1 –

+ Summary table – on 16/08 at ~ 09: 00 7 CPass 1 – LHC 12 e n Of the 11 successful runs: n 11 at CPass 1 reco+Calib. Train n 11 at CPass 1 merging+OCDB C. Zampolli 8/10/16

+ LHC 12 d 8/10/16 C. Zampolli 8

+ LHC 12 d 8/10/16 C. Zampolli 8

+ Summary table – on 16/08 at ~ 09: 00 9 LHC 12 d

+ Summary table – on 16/08 at ~ 09: 00 9 LHC 12 d n 224 in logbook n n n Filters used: LHC 12 d, PHYSICS, Good Run, GRP ok at least one of [SDD, TPC, TRD, TOF, T 0] CPass 0: n Snapshot: 220 n Reco+Calib. Train: 220 n Merging+OCDB: 220, 176 needed, 145 ok, 2 running CPass 1: n Snapshot: 145 n Reco+Calib. Train: 145 n Merging+OCDB: 143, 145 needed C. Zampolli Reprocessing of missing runs at merging (due to reco issues, OCDB snapshot not working…) ongoing with Rev-21 8/10/16

+ Difference between logbook and snapshot in Mon. ALISA n In logbook, but not

+ Difference between logbook and snapshot in Mon. ALISA n In logbook, but not in Mon. ALISA: n n 10 184370 (EMCAL), 184645 (EMCAL), 185345 (ACORDE trigger), 185347 (ACORDE trigger), 185467 still in the migration process In Mon. ALISA but not in the logbook: n C. Zampolli 185190 (short run, the quality flag was changed) 8/10/16

+ Summary table – on 16/08 at ~ 09: 00 11 CPass 0 –

+ Summary table – on 16/08 at ~ 09: 00 11 CPass 0 – LHC 12 d n COSMICS: 9 failure expected n EMCAL/PHOS/MUON: 33 failure expected n No triggers: 2 failure expected (too short run) n EE/EV/Expired: 1 memory issue during the merging (under investigation) n Running: 2 n Others (detectors): 28 n Successful: 145 n 145/(145+28+1) = 83. 3% success rate C. Zampolli 8/10/16

+ 12 Summary table – on 16/08 at ~ 09: 00 CPass 0 –

+ 12 Summary table – on 16/08 at ~ 09: 00 CPass 0 – LHC 12 d Failure reason Run Number 184880 184882 Failure reason TPC Gain Threshold (1) Run Number 185460 Also TRD 184885 184886 COSMICS (9) 184889 184910 184914 184918 186264 C. Zampolli 16 recovered rerunning with looser constraints for validation (run 185460 not retried, since it failed anyway in TRD) 8/10/16

+ Summary table – on 16/08 at ~ 09: 00 13 CPass 0 –

+ Summary table – on 16/08 at ~ 09: 00 13 CPass 0 – LHC 12 d Failure reason Run Number 185687 185768 185692 185775 185697 185698 T 0 (20) 185776 185778 185784 185699 185700 T 0 (20) 185701 185734 185735 185738 185756 185757 185764 C. Zampolli 185765 Hardware problem, fixed now 8/10/16

+ Summary table – on 16/08 at ~ 09: 00 14 CPass 0 –

+ Summary table – on 16/08 at ~ 09: 00 14 CPass 0 – LHC 12 d Failure reason EMCAL/MUON/P HOS runs (33) Run Number Failure reason Run Number 184443 185456 184481 185559 184663 185560 184664 185562 184709 185631 184716 185647 184719 184762 EMCAL/MUON/P HOS runs (33) 185677 185731 184780 185934 185024 185994 185148 185998 185186 186036 185341 186062 Failure reason Run Number 186159 186192 EMCAL/MUON/P HOS runs (33) 186224 186225 186232 186316 186063 C. Zampolli 8/10/16

+ Summary table – on 16/08 at ~ 09: 00 15 CPass 0 –

+ Summary table – on 16/08 at ~ 09: 00 15 CPass 0 – LHC 12 d Failure reason No triggers (2) Run Number 183915 185190 184190 185133 185378 TRD (8) 185460 Also TPC 185915 185916 186319 186320 EV (1) C. Zampolli 184673 8/10/16

+ Summary table – on 16/08 at ~ 09: 00 16 CPass 1 –

+ Summary table – on 16/08 at ~ 09: 00 16 CPass 1 – LHC 12 d n Of the 145 successful runs: n 145 at CPass 1 reco+Calib. Train n 111 at CPass 1 merging+OCDB… n …of which 111 successful (ignore the red TPC color)… n . . . 1 failed in TRD (184145)… Different statistics for CPass 0 and CPass 1 § 480/480 chunks at CPass 0 § 472/480 chunks at CPass 1 C. Zampolli 8/10/16

+ TRD issue n 17 Due to a problem in the TRD reconstruction, some

+ TRD issue n 17 Due to a problem in the TRD reconstruction, some wrong OCDB entries were produced at CPass 0; it is not possible to get the correct ones without re-running CPass 0 n Some manual OCDB update is needed (after LHC 12 d is fully processed, ongoing for completed runs) n Then CPass 0/CPass 1 should be re-run with a Rev > Rev-18 n Will the failed runs be recovered? Waiting for experts’ reply C. Zampolli 8/10/16

+ LHC 12 c 8/10/16 C. Zampolli 18

+ LHC 12 c 8/10/16 C. Zampolli 18

+ Summary table – on 16/08 at ~ 09: 00 19 LHC 12 c

+ Summary table – on 16/08 at ~ 09: 00 19 LHC 12 c n 205 in logbook n n n Filters used: LHC 12 c, PHYSICS, Good Run, GRP ok at least one of [SDD, TPC, TRD, TOF, T 0] CPass 0: n Snapshot: 208 (but runs put in manually), 1 should be ignored (179444) n Reco+Calib. Train: 207 n Merging+OCDB: 207, 109 needed, 93 ok CPass 1: n Snapshot: 93 n Reco+Calib. Train: 93 n Merging+OCDB: 93 C. Zampolli Reprocessing of missing runs (due to reco issues, OCDB snapshot not working…) ongoing with Rev-21 8/10/16

+ Summary table – on 16/08 at ~ 09: 00 20 CPass 0 –

+ Summary table – on 16/08 at ~ 09: 00 20 CPass 0 – LHC 12 c n COSMICS: 37 failure expected n EMCAL/PHOS/MUON: 58 failure expected n No triggers: 3 failure expected (too short, or not the right trigger configuration) n EE/EV/Expired: 0 n Others (detectors): 16 n Successful: 93 n 93/(93+16) = 85. 3% success rate C. Zampolli 8/10/16

+ Summary table – on 16/08 at ~ 09: 00 21 CPass 0 –

+ Summary table – on 16/08 at ~ 09: 00 21 CPass 0 – LHC 12 c Failure reason COSMICS (37) C. Zampolli Run Number Failure reason Run Number 179658 179941 179712 179943 179713 179944 179717 179946 180987 179723 179948 180988 179725 179950 179730 179951 180992 179960 182749 179736 COSMICS (37) 179740 180164 179742 180979 179743 180980 179746 180981 179747 180983 179758 180984 179766 180985 Failure reason Run Number 180986 COSMICS (37) 180991 182750 8/10/16

+ Summary table – on 16/08 at ~ 09: 00 22 CPass 0 –

+ Summary table – on 16/08 at ~ 09: 00 22 CPass 0 – LHC 12 c Failure reason Run Number 179595 181026 179603 181040 179604 181046 179685 181328 179687 EMCAL/MUON/P HOS runs (58) Failure reason 180552 EMCAL/MUON/P HOS runs (58) 181339 181344 180559 181360 180616 181546 180643 181558 180644 180692 180704 C. Zampolli 8/10/16

+ Summary table – on 16/08 at ~ 09: 00 23 CPass 0 –

+ Summary table – on 16/08 at ~ 09: 00 23 CPass 0 – LHC 12 c Failure reason Run Number 181580 182316 181625 182403 181631 182405 181954 182410 181956 182449 181984 EMCAL/MUON/P HOS runs (58) Failure reason 182003 EMCAL/MUON/P HOS runs (58) 182451 182452 182094 182470 182100 182471 182103 182475 182195 182477 182198 182200 182226 C. Zampolli 8/10/16

+ Summary table – on 16/08 at ~ 09: 00 24 CPass 0 –

+ Summary table – on 16/08 at ~ 09: 00 24 CPass 0 – LHC 12 c Failure reason Run Number 182499 182502 182504 182609 182610 EMCAL/MUON/P HOS runs (60) 182612 182640 182641 182681 182712 182717 182721 C. Zampolli 8/10/16

+ 25 Summary table – on 16/08 at ~ 09: 00 CPass 0 –

+ 25 Summary table – on 16/08 at ~ 09: 00 CPass 0 – LHC 12 c Failure reason No triggers (3) Run Number Failure reason 180934 180716 (*) 181609 180717 (*) 182639 182325 (*) TRD (7) Failure reason Run Number 182509 (*) Run Number 182508 (*) 181617 (**) 182513 (*) 181618 (**) 182724 (*) 181619 (**) 181620 (**) TPC+TRD (9) 181652 (**) 181694 (**) 181698 (**) 181701 (**) 181703 (**) C. Zampolli (*) Low statistics, recoverable (*) Low statistics, not recoverable (**) No SSD/SDD number of contributors to Vertex Track = 0, TRD calibration failing, thinking about how to recover them for TRD; what about TPC? 8/10/16

+ Summary table – on 16/08 at ~ 09: 00 26 CPass 1 –

+ Summary table – on 16/08 at ~ 09: 00 26 CPass 1 – LHC 12 c n n Of the 93 successful runs: n 93 at CPass 1 reco+Calib. Train n 93 at CPass 1 merging+OCDB… n …of which 84 successful in CPass 1 (ignore the red TPC color)… n …and 9 failed in T 0, but are MUON runs – they should have not gone through (different Ali. Root, some changes in T 0) As soon as CPass 1 is completed, 1 week of time will be given for manual update. If too little (QM is very close), we’ll increase it. Then, Vpass should start C. Zampolli 8/10/16

+ Further comments 8/10/16 C. Zampolli 27

+ Further comments 8/10/16 C. Zampolli 27

+ Interdependencies n 28 Under discussion: does EMCAL runs need calibration triggers? (PHOS does

+ Interdependencies n 28 Under discussion: does EMCAL runs need calibration triggers? (PHOS does not) C. Zampolli 8/10/16

+ Further issues n 29 Some reconstruction jobs fail with bad_alloc under investigation n

+ Further issues n 29 Some reconstruction jobs fail with bad_alloc under investigation n Grid tests with gdb ongoing not many information retrievable, the jobs ran successfully n Valgrind test ongoing did not show anything significant n Trying with Rev-21 on LHC 12 c, LHC 12 e n C. Zampolli Many errors, but FPE, not bad_alloc n stack trace available n I could not reproduce the problem, still investigating 8/10/16

+ PPass n 30 LHC 12 a and LHC 12 b Vpass validated ready

+ PPass n 30 LHC 12 a and LHC 12 b Vpass validated ready for Ppass n A patched Rev-16 was created to fix the TRD QA issue to be used to run Ppass n LHC 12 a completed, waiting for QA feedback n LHC 12 b completed, waiting for QA feedback C. Zampolli 8/10/16

+ Calibration of old data n 31 GRP/CTP/Aliases entries to be created, after defining

+ Calibration of old data n 31 GRP/CTP/Aliases entries to be created, after defining the classes to be used for the reconstruction n Might be needed to apply some downscale n min(max(nevents/10, 30000), nevents)/nevents, but we need to define nevents C. Zampolli 8/10/16