Towards a Framework for Organized Analysis Andreas Morsch

  • Slides: 14
Download presentation
Towards a Framework for Organized Analysis Andreas Morsch Weekly Offline Meeting 31/5/2007

Towards a Framework for Organized Analysis Andreas Morsch Weekly Offline Meeting 31/5/2007

Why Organized Analysis ? • Most efficient way for many users (analysis tasks) to

Why Organized Analysis ? • Most efficient way for many users (analysis tasks) to read and process the full data set. – In particular if resources are sparse. – Optimise CPU/IO ratio • But also – Helps to develop a common well tested framework for analysis. – Develops common knowledge base and terminology. – Helps documenting the analysis procedure and makes results reproducible.

Scope • Focus on production of AODs from ESD/AOD

Scope • Focus on production of AODs from ESD/AOD

Design Goals • Flexible task and data container structure • User code independent of

Design Goals • Flexible task and data container structure • User code independent of computing schema (interactive: local/proof or batch: grid) • Input data: ESD, AOD – Same design (done) – Common base class ? • Output data: – AOD + user histograms – Transparent handling of memory resident and file resident data

Implementation • Analysis train/taxi similar to PHENIX • Based on the existing Ali. Analysis.

Implementation • Analysis train/taxi similar to PHENIX • Based on the existing Ali. Analysis. Manager/Task framework (A. and M. Gheata)

Organization of Data and Tasks • Input data staging ? – Several trainlets on

Organization of Data and Tasks • Input data staging ? – Several trainlets on sub-data sets staged prior to train departure. – Better: One analysis “train” on the complete data set. • Limits the complexity of the production. • Should be designed to give the optimum under all conditions.

Organization of Tasks • Proposal – On top level • Tasks reading ESDs/AOD and

Organization of Tasks • Proposal – On top level • Tasks reading ESDs/AOD and producing AODs. • Organized by analysis manager – Below top level • Sub tasks producing intermediate transient data • Organized by users (PWGs)

Organization of Data and Tasks • Organisation of analysis tasks – One sub-job per

Organization of Data and Tasks • Organisation of analysis tasks – One sub-job per task – Better: One job executing all tasks. • Protection against sub-task crashes – – “Isolate” tasks using C++ try-throw-catch mechanism Check memory / task Check output data size / task Protection against data corruption • Access rights per task – Dynamic cancelling of tasks – Input data quality checks • could be the first task in the row – Robust book-keeping

GRID/PROOF • Transparence of computing schema – Some improvements in Ali. Analysis. Manager/Ali. Analysis.

GRID/PROOF • Transparence of computing schema – Some improvements in Ali. Analysis. Manager/Ali. Analysis. Task • • Possibility to notify tasks when file is changed in chain (done) More robust output data streaming (done) Possibility to flag tasks as “post event loop”-tasks (done) Handling of file resident data – PROOF uses object streaming • What is a streamable object/task ? – Needs exact defintion. – Attention » Normally persistent objects are streamed » Here: transient object are streamed !! • Needs user support and documentation

Possible Integration of User Code Ali. Analysis. Task Implements Interface Deals with Ali. AODEvent

Possible Integration of User Code Ali. Analysis. Task Implements Interface Deals with Ali. AODEvent Steers Delegates Ali. Analysis. User. Task User Analysis. Code Documents selection and analysis parameters Factory Configuration Macro Working prototype for Ali. Analysis. Task. Jets exists

Who manages the common output objects Ali. AODEvent and AOD Tree ? • What

Who manages the common output objects Ali. AODEvent and AOD Tree ? • What has to be called when – Slave. Begin • Ali. AODEvent constructor • Open file • AOD tree constructor – Execute. Analysis • AOD tree fill • Ali. AODEvent Clear – Terminate • Write. Tree • Close File

Possible Solution 1. Header and trailer analysis tasks handling the Ali. AODEvent I/O 2.

Possible Solution 1. Header and trailer analysis tasks handling the Ali. AODEvent I/O 2. Top Task with user tasks as daughters 3. Ali. Analysis. Manager. AOD deriving from Ali. Analysis. Manager and reimplementing 1. Slave. Begin 2. Execute. Analysis 3. Terminate 4. Virtualize Ali. AODEvent (Ali. VEvent) and add calls to an object of type Ali. VEvent in Ali. Analysis. Manager. Move has much code as possible into Ali. AODEvent 1. 2. 3. 4. Create. Tree Fill. Tree Clear , , ,

Ali. Analysis. Manager. AOD Ali. AODEvent Ali. Analysis. Manager Ali. VEvent Ali. AODEvent Ali.

Ali. Analysis. Manager. AOD Ali. AODEvent Ali. Analysis. Manager Ali. VEvent Ali. AODEvent Ali. Analysis. Manager Ali. VOutput. Event. Handler Ali. AODEvent

More AOD I/O Management Tasks • • Granting access rights per branch Consistency checks

More AOD I/O Management Tasks • • Granting access rights per branch Consistency checks Chopping output files …