Prod Sys 2 Commissioning Jose Enrique Garcia IFICValencia

  • Slides: 24
Download presentation
Prod. Sys 2 Commissioning Jose Enrique Garcia – IFIC-Valencia on the behalf of the

Prod. Sys 2 Commissioning Jose Enrique Garcia – IFIC-Valencia on the behalf of the Prod. Sy 2 team

What is Prod. Sys 2? New distributed production framework: Prod. Sys 2 DEf. T

What is Prod. Sys 2? New distributed production framework: Prod. Sys 2 DEf. T – task request and task definition JEDI – dynamic job definition and task execution • Integrated with Pan. DA (replaces Bamboo) • Engine for user analysis tasks Work closely with Job Transforms & Rucio Integration with monitoring – Big. Pan. DA from Kaushik De JOSE ENRIQUE GARCIA NAVARRO 2

What is Prod. Sys 2? Big. Pan. DA DEFT JEDI Pan. DA JOSE ENRIQUE

What is Prod. Sys 2? Big. Pan. DA DEFT JEDI Pan. DA JOSE ENRIQUE GARCIA NAVARRO 3

Prod. Sys 2 Team Coordination – Kaushik De, Alexei Klimentov DEf. T – MC

Prod. Sys 2 Team Coordination – Kaushik De, Alexei Klimentov DEf. T – MC production request and its processing : M. Borodin – Core SW and communication with JEDI : D. Golubkov – Web UI and Authentication : S. Belov, S. Gayazov JEDI – Tadashi Maeno Within the Big. Pan. DA project : Prod. Sys 2 monitoring – Jaroslava Schovancova, Torre Wenaus, R. Mashinistov Packaging, Software organization (SVN, github) – J. Schovancova Expertise, integration with other domain/areas – Alessandra Forti, Jose E. Garcia, Nurcan Ozturk, Camille Belanger-Champagne, Bruno Lenzi, David South, Andreu Pacheco, . . . JOSE ENRIQUE GARCIA NAVARRO 4

DEf. T : Database Engine for Tasks Production request handling interface for production of

DEf. T : Database Engine for Tasks Production request handling interface for production of MC samples, DPDs or data reprocessing. – Requests placed in new web interface, information stored for review and submission – Templates containing parameters needed for task definitions, such that they can be applied in an easy way – Chain of steps or slice – series of step templates applied on input data. – Request – series of chains Post-production Interface including standard tasks performed by the production managers (abort, change priorities, …) JOSE ENRIQUE GARCIA NAVARRO 5

DEf. T : Database Engine for Tasks JOSE ENRIQUE GARCIA NAVARRO 6

DEf. T : Database Engine for Tasks JOSE ENRIQUE GARCIA NAVARRO 6

JEDI : Job Execution and Definition Interface In production for user analysis since August.

JEDI : Job Execution and Definition Interface In production for user analysis since August. New features available for production : – Dynamic job definition, lost file recovery, network- aware brokerage, log file merging, output merging at T 2 before transferring back to T 1, . . . – Adding flexibility to allow users to specify how secondary datasets are sampled and job parameters are generated – Machinery to retry and merge jobs for event service – Boosting priorities of nearly finished tasks JOSE ENRIQUE GARCIA NAVARRO 7

Big. Pan. DA : Pan. DA Monitoring – Next generation of Pan. DA monitoring

Big. Pan. DA : Pan. DA Monitoring – Next generation of Pan. DA monitoring – Modular, easy to bring up a new project/VO – Clear separation between data access and visualization Runs on top of Oracle or My. SQL DB backends – Describes configuration and modules Deployed with RPMs JOSE ENRIQUE GARCIA NAVARRO 8

COMMISSIONING

COMMISSIONING

Schedule Run-2 Commissioning MC 15 NOW Run-2 Twiki JOSE ENRIQUE GARCIA NAVARRO 10

Schedule Run-2 Commissioning MC 15 NOW Run-2 Twiki JOSE ENRIQUE GARCIA NAVARRO 10

Prod. Sys 2 Commissioning MC Production Group Production Data Reprocessing ESD RAW AOD HIST

Prod. Sys 2 Commissioning MC Production Group Production Data Reprocessing ESD RAW AOD HIST JOSE ENRIQUE GARCIA NAVARRO Merging ESD AOD HIST 11

MC PRODUCTION

MC PRODUCTION

Standard MC Production Chain Event Generation : – Event generation includes single and double

Standard MC Production Chain Event Generation : – Event generation includes single and double step generators. Single step run only using JOs, double step need also input files. Both types have been tested successfully. – Pending : creation of tarball needed to create the production e-tag (done using scripts) Simulation : – Simulation transforms (old and new) have been tested successfully. HITS Merging : – Merging : Currently HITs are merged using a transform in a separate step to a defined number of events. This method works already in the new system. – JEDI Merging : Final details to commission the JEDI internal merging are ongoing. JOSE ENRIQUE GARCIA NAVARRO 13

Standard MC Production Chain Digi+Reco : – Standard mc 12 configuration has been used

Standard MC Production Chain Digi+Reco : – Standard mc 12 configuration has been used to test proper production of AODs. AOD Merging : – Working in Prod. Sys 2 Sample A finished successfully from EVNT to AODs JOSE ENRIQUE GARCIA NAVARRO 14

Other MC Production Chains Upgrade : – Transforms and tags similar to standard MC

Other MC Production Chains Upgrade : – Transforms and tags similar to standard MC production. Main issue in Prod. Sys came from size of output of the files, memory consumption and job length. Things that should be better handled by Prod. Sys 2/JEDI. Overlay : – new transforms will be needed (under development) to produce overlay in Prod. Sys 2 FTK : – Production involves large chain, needs checking. With standard chain validated improvements should be done (Full. Chain transforms and JEDI merging) JOSE ENRIQUE GARCIA NAVARRO 15

Request placing and post-processing Request Interface : • Implemented : – Change of processing

Request placing and post-processing Request Interface : • Implemented : – Change of processing parameters, start chain from any step, finding inputs, … • Pending : – Cloning Request/Task/Chain – Include processing parameters in Templates – List in JIRA : https: //its. cern. ch/jira/browse/PRODSYS Post-processing : – Production post-processing can be done from the web interface. Can be applied to tasks or chains and requests are logged in JIRA and panda logger. JOSE ENRIQUE GARCIA NAVARRO 16

GROUP PRODUCTION

GROUP PRODUCTION

Overall Status Two types of productions have been tested/being tested in Prod. Sy 2.

Overall Status Two types of productions have been tested/being tested in Prod. Sy 2. • Derivation (train) production as per the new analysis model for Run-2: – Workflow being implemented to run derivation production on input x. AOD (high priority) – Workflow implemented to run derivation production on input NTUP_COMMON (full production not launched yet – metadata issue). • Standalone productions (Run-1 style; NTUP_COMMON, NTUP_TOP, NTUP_SUSY, NTUP_TRUTH, etc. : – A handful of these productions have been validated in Prod. Sys 2 by the physics groups themselves, ready to switch over to Prod. Sys 2 (next slides). Pending : – Output file merging (JEDI-internal merging enabled for derived outputs) – Replicating output to group disk for standalone productions (missing functionality between Prod. Sys 2 and Rucio) – DEf. T to be used by group contacts to place their standalone production requests directly into Prod. Sys 2 (approval done by production managers) NURCAN OZTURK 18

Workflows Validated 19 Not all NTUP types will be tested – similar workflows, ASG

Workflows Validated 19 Not all NTUP types will be tested – similar workflows, ASG gave green light for switch over to Prod. Sys 2 NURCAN OZTURK 19

Request Placing • DEf. T works well by both uploading the task submission list

Request Placing • DEf. T works well by both uploading the task submission list files and by making the list files on-the-fly using the interface itself. • Tested by group production manager only Pending : • Group names need to use the same naming convention as in the GLANCE system (metadata group). • Needs to be tested by a few group contacts after required features are added, being discussed in: https: //its. cern. ch/jira/browse/PRODSYS-231. High priority as DPD Savannah will migrate to JIRA and no plans to use JIRA for receiving production requests, but only as a problem tracking tool. • Parameters required in task definitions should be finalized, for instance: – “lumiblock=yes”, JEDI is supposed to use this internally, lingering issue since last Software week. – “destination” handling. Handled by Prod. Sys 2 till the required functionality between Prod. Sys 2 and Rucio is in place? NURCAN OZTURK 20

DATA REPROCESSING

DATA REPROCESSING

Data Reprocessing request somewhat different than MC – Can involve more than one Reco

Data Reprocessing request somewhat different than MC – Can involve more than one Reco Merging chain that uses some of the outputs of a previous chain – Implemented and tested successfully (although not extensively) in Prod. Sys 2 Many problems encountered during DC 14 with reco and merging software prevented further tests. System not thoroughly validated – Reprocessing wish list https: //twiki. cern. ch/twiki/bin/view/Atlas/Reprocessing. Prod. Sys 2 including e. g. configuration of merging parameters, file exclusion list, etc BRUNO LENZI 22

Next Steps and Schedule • MC Production : a. Validate the output of the

Next Steps and Schedule • MC Production : a. Validate the output of the sample A in physics validation – September b. Exercise DC 14 and MC 15 chains to ensure readiness – September c. Start processing some production requests in the system – September d. Implement missing functionality for event generation and open fully to all requests – October e. Implement special cases – September to November f. Phase out Prod. Sys – November • Group Production: a. Launch full scale derivation production (with JEDI-internal merging enabled) – this week b. Finalize DEf. T to be used by group contacts, get tested by a few group contacts – September c. Group contacts use DEf. T fully, DPD Savannah migrates to JIRA – October d. Same plan as in e. and f. above JOSE ENRIQUE GARCIA NAVARRO 23

Next Steps and Schedule • Data Reprocessing : a. Test with f-tag (need input

Next Steps and Schedule • Data Reprocessing : a. Test with f-tag (need input from PROC) b. Submission of special runs of DC 14 – this week c. Re-submission of failed jobs of DC 14 (new cache needed) Suggest to keep Prod. Sys 1 as long as possible (end of the year) JOSE ENRIQUE GARCIA NAVARRO 24