v 5 01 Release v 5 02 Release

  • Slides: 19
Download presentation
v 5 -01 -Release & v 5 -02 -Release Peter Hristov 23/01/2012

v 5 -01 -Release & v 5 -02 -Release Peter Hristov 23/01/2012

Changes: v 5 -01 -Rev-21 • #90324: Exception in Ali. ITStracker. MI: : Follow.

Changes: v 5 -01 -Rev-21 • #90324: Exception in Ali. ITStracker. MI: : Follow. Prolongation. Tree. From rev. 53978 • #90549: Request to port r 53948 to the release (MUON small leak fix) • #90658: For v 5 -01: Option to isolate heavy flavor part of a Pythia event. From rev. 53959 • #84578: Request to extend Ali. Gen. Box for using Yrange. From rev. 53996 • Optional RB/PX 24 shielding and scoring. From rev. 53955, 53956

Changes: v 5 -01 -Rev-21 • #90461: Request to port a new feature for

Changes: v 5 -01 -Rev-21 • #90461: Request to port a new feature for ZDC to the release. From rev. 53705 • #90504: EVE muon_init. C update r 53875 • #25142: Commit and porting to Release of the new ESD->AOD filter. From rev. 54021 • #90540: Port 53910, 53911 and 53912 to the Release (Full MC Header in the AOD)

GDB on Grid • Some potential problem detected and fixed (ITS, TPC, HLT) •

GDB on Grid • Some potential problem detected and fixed (ITS, TPC, HLT) • Some jobs fail in the beginning (event 0 -10), ~4% – Not reproducible locally, even if we run many reconstruction jobs in parallel – Always caused by std_badalloc in different places • Other jobs are killed by the system (memory) ~20%

Requests/Additional fixes • #90749 ESD Porting Request: Get. TPCCluster. Info with additional switch •

Requests/Additional fixes • #90749 ESD Porting Request: Get. TPCCluster. Info with additional switch • #90743 Coverity fix in Ali. VCalo. Cells : missing assignment operator • #90738 Request to port a fix to the release in Ali. ZDCDigitizer • #90625 Memory problem in Ali. TPCtracker. MI • #90622 Logic flaw in Ali. TPCseed • #90616 Worrying message from TPC reconstruction • Changes in RAW (TClones. Array usage)

Requests: OCDB • #90756 Request to port object in RAW OCDB (for realistic MUON

Requests: OCDB • #90756 Request to port object in RAW OCDB (for realistic MUON simulations) • #90736 Calibration of the TRD cosmics of May, Jun and August

Other reports • #90615 Problems in the material budget, eta<0. 9 and 0. 9<eta<1.

Other reports • #90615 Problems in the material budget, eta<0. 9 and 0. 9<eta<1. 4

v 5 -02 -Release • • Coverity: 158 defects to be fixed Ali. Root

v 5 -02 -Release • • Coverity: 158 defects to be fixed Ali. Root tests: mostly OK Root v 5 -32 -00 -patches: needs tests PWGs transition: to be completed this week One library per subdirectory: next week Savannah bug reports: ongoing cleanup Do we have any significant set of changes still missing in the trunk?

Old slides

Old slides

Reconstruction of RAW (LHC 11 h) • Back trace problem solved • Clean-up of

Reconstruction of RAW (LHC 11 h) • Back trace problem solved • Clean-up of the PATH and LD_LIBRARY_PATH on the GRID • Clean-up of the Ali. En libraries • Deterministic splitting of the failed jobs (in preparation) • New tests in parallel with the Grid production

Changes: v 5 -01 -Rev-20 • #90319: Segmentation violation in Ali. PHOSRaw. Fitterv 1:

Changes: v 5 -01 -Rev-20 • #90319: Segmentation violation in Ali. PHOSRaw. Fitterv 1: : ~Ali. PHOSRaw. Fitterv 1. From rev. 53869 • #90053: Request: Port bug fix TRD calibration code to release. From rev. 53734 • #90292: Add line Convert. ZDC() in Ali. Analysis. Task. ESDfilter: : Convert. ESDto. AOD(). From rev. 53895 • #90307: ZDC QA update. From rev. 52738, 53081, 53271 • #90309: ZDC request to port code to the release. From rev. 52616 • #90024: port changes in PYTHIA 6 for pyquen production (pyquen 1. 5. F, CMakelib 6. 4. 21. pkg updated), rev. 53645 • #90359: Request: fix cached values in ESD. From rev. 53900 • #90013: Vertexing task crashing in trunk. From rev. 53793 • Additional protection. From rev. 53904

LHC 11 h Pass 2 – reconstruction details • • • Use v 5

LHC 11 h Pass 2 – reconstruction details • • • Use v 5 -01 -Rev-19 in the production Start in inverse time order (last runs first, “LIFO”): OK Use MB trigger for CPass 0: OK Exercise the full production setup on runs from “grey area”: special “gdb” production, run 170593: OK Run with TPC pools: OK Work on a local raw file: OK Use OCDB snapshot: OK Keep only the rec. points for the current event: OK Switch off QA: OK Switch off MUON, if the memory consumption is still too high 12

Results • CPass 0: 185 jobs, 523, 509 out of 539, 890 raw files

Results • CPass 0: 185 jobs, 523, 509 out of 539, 890 raw files successfully reconstructed => 97% efficiency • All runs with mag. field configuration (+ +) ready (170593 -169628) • Details on losses follow • Pass 2 current status: 131 jobs, 225, 568 out of 362, 790 files successfully reconstructed => 62. 2% efficiency 13

Losses – Pass 2 • G_exception – average 6. 5% Strong run dependency 14

Losses – Pass 2 • G_exception – average 6. 5% Strong run dependency 14

Losses – Pass 2 (2) • Memory overrun – average 16. 8% Strong run

Losses – Pass 2 (2) • Memory overrun – average 16. 8% Strong run dependency Function of number of events/chunk and data taking configuration 15

Losses • G_exception • Debugging hard as there is no traceback • Seems to

Losses • G_exception • Debugging hard as there is no traceback • Seems to be random (from syswatch. log) • Irreproducible in local tests • No related issues shown by Valgrind • Appears in the first events of the chunks • Working with ROOT experts, at least to get the exception in the logs => special “gdb” run • Memory overrun • Additional profiling ongoing • All external sources are out – gain only possible through changes in reconstruction 16

Special “gdb” run • “catch throw” mode • Several problems discovered, to be submitted

Special “gdb” run • “catch throw” mode • Several problems discovered, to be submitted to Savannah. Most probably uninitialized memory is used as index in an array – TClones. Array new with placement, where the index come from Get. Entries. Fast – corrupted (? ) raw data – deletion of arrays

Plans • Continue the investigation of G__exception on the GRID • Understand the difference

Plans • Continue the investigation of G__exception on the GRID • Understand the difference between CPass 0 and Pass 2 (MB trigger, V 0 s, cascades? ) • Try to reproduce completely the GRID execution flow on a local machine • Resubmit the failed jobs in “split” mode

v 5 -02 -Release • Complete the transition of the analysis code to the

v 5 -02 -Release • Complete the transition of the analysis code to the new modules • Move every library to a sub-directory and get rid of *. pkg (native CMake) • Fix the Coverity defects and compilation warnings • Solve as much as possible Savannah issues • Create the branch at the end of January • First stable tag in February