Software Architecture and Data Model Software framework services

  • Slides: 32
Download presentation
Software Architecture and Data Model Software framework, services and persistency in high level trigger,

Software Architecture and Data Model Software framework, services and persistency in high level trigger, reconstruction and analysis Vincenzo Innocente CERN/EP/CMC 1/1/2022

CMS (offline) Software Environmental data Request part of event Slow Control Quasi-online Reconstruction Online

CMS (offline) Software Environmental data Request part of event Slow Control Quasi-online Reconstruction Online Monitoring store Event Filter Objectivity Formatter Request part of event Store rec-Obj Request part of event store Persistent Object Store Manager Object Database Management System store Simulation G 3 and or G 4 Software Architecture and Data Model Vincenzo Innocente, CERN/EP Store rec-Obj and calibrations Data Quality Calibrations Group Analysis Request part of event User Analysis on demand

Requirements (from the CTP) u. Multiple Environments: u Various software modules must be able

Requirements (from the CTP) u. Multiple Environments: u Various software modules must be able to run in a variety of environments from level 3 triggering, to individual analysis u. Migration between environments: u Physics modules should move easily from one environment to another (from individual analysis to level 3 triggering) u. Migration u Should to new technologies: not affect physics software module Software Architecture and Data Model Vincenzo Innocente, CERN/EP

Requirements (from the CTP) u. Dispersed code development: u The software will be developed

Requirements (from the CTP) u. Dispersed code development: u The software will be developed by organizationally and geographically dispersed groups of part-time non-professional programmers u. Flexibility: u Not all software requirements will be fully known in advance Not only performance Also modularity, flexibility, maintainability, quality assurance and documentation. Software Architecture and Data Model Vincenzo Innocente, CERN/EP

CMS Data Model R&D 95 -96: RD 41 --- OO Detector Reconstruction u Detector

CMS Data Model R&D 95 -96: RD 41 --- OO Detector Reconstruction u Detector model, Local hit cache, Pattern recognition 95 -97: RD 45 --- OO Event Model (persistent) u Event structure, Raw data, Reconstructed objects 95 -97: RD 45 --- Calibration Database u Time dependent data, Versioning, Experience with Objectivity 12/96: CTP decision to use OO and ODBMS 97 - present: GIOD u Many clients access over LAN and WAN 97 -98: Test-Beam (H 2, X 5) u OO Daq, Online filtering, ODB population 99 -00: ORCA production u Meta. Data, concurrent jobs, multi-threading, RT dynamic loading 2001: Milestone on ODBMS “vendor” choice Software Architecture and Data Model Vincenzo Innocente, CERN/EP

Use Cases (current functionality in ORCA) Simulated Hits Formatting Digitization of Piled-up Events Test-Beam

Use Cases (current functionality in ORCA) Simulated Hits Formatting Digitization of Piled-up Events Test-Beam DAQ & Analysis L 1 Trigger Simulation Track Reconstruction Calorimeter Reconstruction Global Reconstruction Physics Analysis Software Architecture and Data Model Vincenzo Innocente, CERN/EP

Reconstruction Scenario Reproduce Detector Status at the moment of the interaction: u front-end electronics

Reconstruction Scenario Reproduce Detector Status at the moment of the interaction: u front-end electronics signals (digis) u calibrations u alignments Perform local reconstruction as a continuation of the front-end data reduction until objects detachable from the detectors are obtained Use these objects to perform global reconstruction and physics analysis of the Event Store & Retrieve results of computing intensive processes Software Architecture and Data Model Vincenzo Innocente, CERN/EP

Reconstruction Sources Software Architecture and Data Model Vincenzo Innocente, CERN/EP

Reconstruction Sources Software Architecture and Data Model Vincenzo Innocente, CERN/EP

Components Reconstruction Algorithms Event Objects Physics Analysis modules Other services (detector objects, environmental data,

Components Reconstruction Algorithms Event Objects Physics Analysis modules Other services (detector objects, environmental data, parameters, etc) Legacy not-OO data (GEANT 3) The instances of these components require to be properly orchestrated to produce the results as specified by the user Software Architecture and Data Model Vincenzo Innocente, CERN/EP

CARF CMS Analysis & Reconstruction Framework Application Physics modules Framework Reconstruction Algorithms Event Filter

CARF CMS Analysis & Reconstruction Framework Application Physics modules Framework Reconstruction Algorithms Event Filter Physics Analysis Calibration Objects Data Monitoring Meta. Data Objects Event Objects Utility Toolkit LHC++ ODBMS Software Architecture and Data Model Vincenzo Innocente, CERN/EP Geant 4 CLHEP Paw Replacement C++ standard library Extension toolkit

Architecture structure An application framework CARF (CMS Analysis & Reconstruction Framework), customisable for each

Architecture structure An application framework CARF (CMS Analysis & Reconstruction Framework), customisable for each of the computing environments Physics software modules with clearly defined interfaces that can be plugged into the framework Persistency Service integrated into the framework to provide a transparent interface to physics modules A service and utility Toolkit that can be used by any of the physics modules The framework (and the utility Toolkit) effectively shields physics modules from the underlying technology without penalizing performances Software Architecture and Data Model Vincenzo Innocente, CERN/EP

Persistency Services Persistent Object Management is fully integrated in CARF using an ODBMS CARF

Persistency Services Persistent Object Management is fully integrated in CARF using an ODBMS CARF manages u multi-threaded transactions u creation of databases and containers u meta data and event collections u physical clustering of event objects u persistent event structure and its relations with transient objects Use of Database is transparent to detector developers u users access persistent objects through C++ pointers Software Architecture and Data Model Vincenzo Innocente, CERN/EP

Software Architecture and Data Model Vincenzo Innocente CERN/EP/CMC 1/1/2022

Software Architecture and Data Model Vincenzo Innocente CERN/EP/CMC 1/1/2022

HEP Data User Tag (N-tuple) Environmental data Detector and Accelerator status u Calibrations, Alignments

HEP Data User Tag (N-tuple) Environmental data Detector and Accelerator status u Calibrations, Alignments u Tracker Alignment Event-Collection Meta-Data (luminosity, selection criteria, …) … Event Data, User Data Collection Meta-Data Tracks Event Collection Electrons Event Navigation is essential for an effective physics analysis Complexity requires coherent access mechanisms Software Architecture and Data Model Vincenzo Innocente, CERN/EP Ecal calibration

Do I need a DBMS? (a self-assessment) Do I encode meta-data (run number, version

Do I need a DBMS? (a self-assessment) Do I encode meta-data (run number, version id) in file names? How many files and logbooks I should consult to determine the luminosity corresponding to a histogram? How easily I can determine if two events have been reconstructed with the same version of a program and using the same calibrations? How many lines of code I should write and which fraction of data I should read to select all events with two ’s with p > 11. 5 Ge. V and | |<2. 7? The same at generator level? If the answers scare you, you need a DBMS! Software Architecture and Data Model Vincenzo Innocente, CERN/EP

Can CMS do without a DBMS? An experiment lasting 20 years can not rely

Can CMS do without a DBMS? An experiment lasting 20 years can not rely just on ASCII files and file systems for its production bookkeeping, “condition” database, etc. Even today at LEP, the management of all real and simulated data-sets (from raw-data to n-tuples) is a major enterprise u Multiple models used (DST, N-tuple, HEPDB, FATMAN, ASCII) A DBMS is the modern answer to such a problem and, given the choice of OO technology for the CMS software, an ODBMS (or a DBMS with an OO interface) is the natural solution for a coherent and scalable approach. Software Architecture and Data Model Vincenzo Innocente, CERN/EP

A “BLOB” Model Event Data. Base Objects Event Raw. Event Rec. Even t Blob:

A “BLOB” Model Event Data. Base Objects Event Raw. Event Rec. Even t Blob: Blob a sequence of bytes. Decoding it is a “user” responsibility. Blob Why should Blobs not be stored in the DBMS? Software Architecture and Data Model Vincenzo Innocente, CERN/EP Blob

Raw Event Index Raw. Event Raw. Data belonging to different “detectors” are clustered into

Raw Event Index Raw. Event Raw. Data belonging to different “detectors” are clustered into different containers. The granularity will be adjusted to optimize I/O performances. Read. Out Raw. Dat a Vector of Digi . . . Raw. Dat a An index at Raw. Event level is used to avoid the access to all containers in search for a given Raw. Data. Vector of Digi Index implemented as an ordered vector of pairs Software Architecture and Data Model Vincenzo Innocente, CERN/EP Raw. Data are identified by the corresponding Read. Out. A range index at Raw. Data level could be used for fast random access in complex detectors.

CMS Reconstructed Objects produced by a given “algorithm” are managed by a Reconstructor. Rec.

CMS Reconstructed Objects produced by a given “algorithm” are managed by a Reconstructor. Rec. Event S-Track Reconstructo r S Track “aod” . . S Track Software Architecture and Data Model Vincenzo Innocente, CERN/EP “esd” Track Sec. Info “rec” Track Constituent s Vector of RHits A Reconstructed Object (Track) is split into several independent persistent objects to allow their clustering according to their access patterns (physics analysis, reconstruction, detailed detector studies, etc. ). The top level object acts as a proxy. Intermediate reconstructed objects (RHits) are cached by value into the final objects.

Physical clustering Software Architecture and Data Model Vincenzo Innocente, CERN/EP

Physical clustering Software Architecture and Data Model Vincenzo Innocente, CERN/EP

User Data Histograms and N-tuples are “user” event-data and, for any serious use, require

User Data Histograms and N-tuples are “user” event-data and, for any serious use, require a level of management and book-keeping similar to the “experiment-wide” event data. The same tools can be used with the advantage of keeping the interface and the user environment consistent. What counts is the efficiency and reliability of the analysis: The most sophisticated histogramming package is useless if you are unable to determine the luminosity corresponding to a given histogram! Software Architecture and Data Model Vincenzo Innocente, CERN/EP

Objectivity CMS adopted the object paradigm in the CTP At the same time, in

Objectivity CMS adopted the object paradigm in the CTP At the same time, in close collaboration with RD 45, an evaluation of various object storage solutions was undertaken and Objectivity/DB was chosen as baseline product for further evaluation, tests and prototypes in particular for CMS data related milestones. Objectivity/DB provides: uscalable ufull architecture in the PB range multi-platform support udata distribution and MSS interface through a customizable “slim” data server (AMS) uvery efficient C++ binding close to ODMG standard with minimal proprietary parsing Software Architecture and Data Model Vincenzo Innocente, CERN/EP

Objectivity Features CMS (really) uses Persistent objects are real C++ (and Java) objects u

Objectivity Features CMS (really) uses Persistent objects are real C++ (and Java) objects u coherent access to any kind of object I/O cache (memory) management u no explicit read and write u no need to delete previous event Smart-pointers (automatic id to pointer conversion) Efficient containers by value (VArray) VArray Full direct navigation in the complete federation u from Meta. Data to Event-Data u from Event-Data back to Meta-Data Flexible object physical-clustering Object Naming u as top level entry point (at “collection” level) u as rapid prototyping tool Software Architecture and Data Model Vincenzo Innocente, CERN/EP

More ODBMS (Objy) Advantages Novel access methods: u. A collection of “electrons” with no

More ODBMS (Objy) Advantages Novel access methods: u. A collection of “electrons” with no reference to events u Direct reference from event-objects to “condition database” u Direct reference to event-data from user-data Flexible run-time clustering of heterogeneous-type objects u cluster together all tracks or all objects belonging to the same event Real DB management of reconstructed objects u add or modify in place and on demand parts of an event Software Architecture and Data Model Vincenzo Innocente, CERN/EP

CMS Experience Designing and implementing persistent classes not harder than doing it for native

CMS Experience Designing and implementing persistent classes not harder than doing it for native C++ classes. Easy and transparent distinction between logical associations and physical clustering. Fully transparent I/O in a distributed environment, with performances essentially limited by disk and network speed (random access). File size overhead (5% for realistic CMS object sizes) not larger than for other “products” such as ZEBRA, BOS etc. Objectivity/DB (compared to other products we are used to) is robust, stable and well documented. It provides also many additional useful features. All our tests show that Objectivity/DB can satisfy CMS requirements in terms of performance, scalability and flexibility Software Architecture and Data Model Vincenzo Innocente, CERN/EP

CMS Experience There additional “configuration elements” to care about: ddl files, schema-definition databases, database

CMS Experience There additional “configuration elements” to care about: ddl files, schema-definition databases, database catalogs u organized software development: rapid prototyping is still possible, its integration in a product should be done with care u Now fully integrated in CMS cvs and SCRAM environments System requires tuning to avoid performance degradations u monitoring of running applications is essential, off-the-shelf solutions often exist (Ba. Bar, Compass) u CMS HLT production is now at the leading edge of monitoring and tuning Objectivity/DB is a “bare” product. It does not impose a framework: u integration into a framework (CARF) is our responsibility Objectivity is slow to apply OUR changes to their product u Is this a real problem? Do we really want a product whose kernel is changed at each user request? Software Architecture and Data Model Vincenzo Innocente, CERN/EP

CMS Experience (missing features `99) Scalability: 64 K files are not enough (Scheduled for

CMS Experience (missing features `99) Scalability: 64 K files are not enough (Scheduled for Dec 2000) containers are the natural Objectivity units, still things for which the OS (and files) is preferred è“bulk” data transfer (to mass-storage, among sites) èaccess control, space allocation to users, etc. Efficient and secure data-server (AMS ok in 5. 2!!!) u with MSS and WAN support Support for “private” user classes and user data experiment-wide ones) u many custom solution based on multi-federation u Active schema User Application Layer u like a rapid prototyping environment Software Architecture and Data Model Vincenzo Innocente, CERN/EP (w. r. t.

Objy-HEP: Building a Partnership Objectivity recognize that HEP requirements anticipate future requirements of other

Objy-HEP: Building a Partnership Objectivity recognize that HEP requirements anticipate future requirements of other clients u the next versions will include solutions to almost all our improvement requests The New AMS has been essentially developed at SLAC CERN has built version 5. 2. 1 for Linux RH 6. 1 CERN will help in building a full port to Solaris CC 5 CERN will prototype a new lockserver monitor It is essential to continue to develop this partnership and increase the trust of both partners in each other. Software Architecture and Data Model Vincenzo Innocente, CERN/EP

Alternatives: ODBMS Versant is a viable commercial alternative to Objectivity u do we have

Alternatives: ODBMS Versant is a viable commercial alternative to Objectivity u do we have time to build an effective partnership (eg. MSS interface)? Espresso (by IT/DB) we need to be able to produce a fully fledged ODBMS in a couple of years once the proof-of-concept prototype is ready CMS will test Espresso in the context of CARF this summer Migrate CARF from Objectivity to another ODBMS u We expect that it would take about one year u Such a transition will not affect the basic principles of CMS software architecture and Data Model u Will involve only the core CARF development team. u Will not disrupt production and physics analysis Software Architecture and Data Model Vincenzo Innocente, CERN/EP

Alternatives: ORDBMS (Relational DB with OO interface) are appearing on the market u First

Alternatives: ORDBMS (Relational DB with OO interface) are appearing on the market u First products look targeted to those who have already a relational system and wish to make a transition to OO u More realistic Object Oriented products could appear in the near future Evaluation of their usage in HEP will start soon. u No experiment is using (or planning to use) them u IT/DB is in contact with Oracle and is planning to evaluate their OO product. Still early to assess impact of ORDBMS on CMS Data Model and on migration effort Software Architecture and Data Model Vincenzo Innocente, CERN/EP

Fallback Solution (less functionality): Hybrid Models u (R)DBMS for Meta. Data, Calibration, etc u

Fallback Solution (less functionality): Hybrid Models u (R)DBMS for Meta. Data, Calibration, etc u Object-Stream files for event data u Ad-hoc networked dataserver and MSS interface Less flexible u Rigid split between DBMS and event data u One way navigation from DBMS to event data More complex u Two different I/O systems u More effort to learn and maintain This approach will be used by several experiment at BNL and Fermi. Lab u (RDBMS not directly accessible from user applications) CMS and IT/DB are following closely these experiences. We believe that this solution could seriously compromise our ability to perform our physics program competitively Software Architecture and Data Model Vincenzo Innocente, CERN/EP

ODBMS Summary A DBMS is required to manage the large data set of CMS

ODBMS Summary A DBMS is required to manage the large data set of CMS (including user data) An ODBMS provides a coherent and scalable solution for managing data in an OO software environment Once an ODBMS will be deployed to manage the experiment data, it will be very natural to use it to manage any kind of data related to detector studies and physics analysis Objectivity/DB is a robust and stable kernel ideal to be used as the base to build a custom storage framework Objectivity starts to respond to our peculiar requirements Software Architecture and Data Model Vincenzo Innocente, CERN/EP