Data Management in LHCb Status report plans Preparation
Data Management in LHCb Status report & plans ØPreparation for first (small) data challenge ØPreparation for using the GRID M. Frank LHCb/CERN
Gaudi: The Philosophy ØSeparation between the transient and the persistent data representation ØSeparation between event and detector (conditions) data ØPossibility to foster multi-technology persistency solutions GAUDI M. Frank LHCb/CERN
Managing Data ØManaging data itself (Physical view) ØStorage mechanism ØManage the access to the data (Logical view) ØOptimize for access patterns ØResources ØSpeed GAUDI M. Frank LHCb/CERN
Physical View: Boxes and Cables Servers tertiary Storage Disk . . . ØBottlenecks ØNetwork, I/O, ellapsed time ØPre-select events quickly, spend time where needed GAUDI M. Frank LHCb/CERN
Logical View: Databases Physics Algorithm Framework Database Technology GRID & CO. Castor GAUDI Transient Data Representation Persistent Data Representation Database internal Representation NETWORK (opt) Database files Tertiary Storage (Tape) M. Frank LHCb/CERN
Processing Elements Monte Carlo Generator Detector/DAQ Generator Data Analysis Cycle Event Tags Detector Simulation Real Data Production Sim. Raw Data Analysis Group Prod. Analysis Data Reconstruction Event Data Reconstruction Data Event Tag Generation Reconstructed Data Analysis Group Tags Analysis Job Event Tag Collection Analysis Job Results Event Tags GAUDI M. Frank LHCb/CERN Detector Data
Data Access Mechanism Runset (Master) Physics MC: B -> ππ MC: B -> J/Ψ (μ+ μ-) … Dataset Event 1 Event 2 … Event 2 … … 3 Event N Collection Set B -> ππ Candidates (Phy) B -> J/Ψ (μ+ μ-) Candidates … Bookkeeping GAUDI Dataset Event 1 tag collctn Event 1 Event Tag 21 5 Event 2 … Tag 2 2 … … 3 Event 3 Tag M 8 0. 3 1. 2 3. 1 M. Frank LHCb/CERN
Gaudi Data Store Ø Tree - similar to file system Ø Identification by path ”/Event/MCEcal. Hit” Ø Navigation through data tree using the logical address. Ø Link objects with logical addresses. GAUDI M. Frank LHCb/CERN
Data Access Ø Transparent Data Access Ø No difference when accessing local or remote data files Ø Data Organization Ø Processing phases Ø Versioning Ø Data Clustering Data Files Data Domain 2 Files Ø Data Replication Data Files Ø Read only data Ø “Logical” Data Addressing Ø Essential for data replication Ø Distributed translation of “logical” to “physical” Ø Essential for implementing “relationships” GAUDI DB Domain 1 M. Frank LHCb/CERN
Data Organization Event versions Event Raw Velo Calo RAW GAUDI Phy Rec Phy Tracks Hits Cand My. Trk AOD Private ESD M. Frank LHCb/CERN
Links (Gaudi) ØVertical and Horizontal Links ØVertical: “tree-structure” ØHorizontal: “Physics contents driven” ØTime reverse direction ØESD->RAW ØAOD->ESD ØNOT: RAW->AOD GAUDI Event versions Event Raw Velo Calo RAW Rec Tracks Phy Hits ESD Cand AOD M. Frank LHCb/CERN Phy My. Trk Private
“Logical” Data Addressing ØEssential ØNeeded for the data organization to address Data. Sets and Objects within these Data. Sets independently of the physical file (replica) and mass storage state (migrated, staged) ØExtended Object ID (XID) ØLogical File Name (LFN) + Local Address within a file (record#, offset, …) ØDistributed translation from “Logical” to “Physical” file names ØCentralized translation would be a bottleneck. GAUDI M. Frank LHCb/CERN
Event Tag Collections ØN-tuples: O(102) Bytes/Event ØTags from collaboration wide data processing ØGroup tags ØUser tags ØAllow to retrieve all event data if requested ØRaw, Reconstructed and Analysis data ØNOT part of the event ØMinimize data transfer ØPre-select based on tag information ØAccess data if pre-selection is satisfied GAUDI M. Frank LHCb/CERN
Event Tag Collections Client Server Access Event Collection Select from Collection where N(track)>50 N-tuple-Entry with N(track)>50 Criteria based pre-selection [2] N-tuple based pre-selection Access Event Data Request event data Read Event Data Event data Analyse Event GAUDI [1] M. Frank LHCb/CERN
Accessing Detector Data begin. Event request Geant 4 Service • Manages store • Synchronization updates Display Service Detector. Data Service Geant 4 Rep. Graphics Rep. Persistency Service request: get, update IDet. Element IGeometry. Info Geometry Info Algorithm Det. Element ICalibration IRead. Out reference Geometry Conversion Service Conditions DB Conversion Service Other DBs Read. Out Velo. Station Transient Detector Store GAUDI Conversion Service M. Frank LHCb/CERN
Conditions DB ØCollaboration project with DB group + ATLAS + Compass + Harp + … ØLead by Stefano Paoli IT/DB ØCurrent implementation uses Objectivity/DB ØSufficient for testing interfaces etc. ØInterested in ORACLE implementation ØSuccessful collaboration ØExperiments specified requirements & interfaces ØImplementation and support from IT ØSome concern about long term commitment GAUDI M. Frank LHCb/CERN
Conditions DB ØAccessing detector conditions data: calibration, slow control, alignment, geometry, etc. ØTime validity period ØVersioning ØPhysics algorithm sees only projection according to ØItem (Glossary: “Folder”) ØTime stamp ØSelected version (Glossary: “Tag”) GAUDI M. Frank LHCb/CERN
Plans Until Next Summer ØLHCb milestone Data Challenge: 106 events (2 weeks) ØEvent Data ØEvaluate ORACLE ØEvent Collections ØBookkeeping ØDetector data GAUDI Possible common projects M. Frank LHCb/CERN
Plans Until Next Summer Ø Event Data Ø Use ROOT + Castor Ø File based Ø “Semi-direct” data access Partially implemented, (But: we have ODBC implementation) not tested on bigger scale Ø Event Tag Collections Ø Store collections in ORACLE / ODBC Ø Bookkeeping Ø in ORACLE / ODBC Ø start from existing application GAUDI RDBMS • Searchable: SQL • Indexing/fast access ODBC • Open Data. Base Connectivity • Stay open • ORACLE exists at CERN • Possibly replicate using other technology M. Frank LHCb/CERN
Preparation Phase: Event Data Ø Produce ROOT file with reconstructed data Ø Test suite for Gaudi I/O mechanism ØIdentify problems Ø Data content allows few selected physics analyses ØLimited number of users ØAllow faster access to event data ØDefinition data content ØNot all Fortran data can be converted Ø Transparent access to ZEBRA possible, but with penalty ØSequential files Ø Get experience with integration to Castor GAUDI M. Frank LHCb/CERN
Preparation Phase: Event Tags ØImplemented using N-tuples + modification to allow access to event data ØFirst attempt to write tag collections to ROOT ØSimple, portable ØThen try ORACLE implementation ØTry to exploit indexing and server side execution capabilities GAUDI M. Frank LHCb/CERN
Preparation Phase: Bookkeeping ØLHCb bookkeeping project (http: //www. cern. ch/lhcb-comp/bookkeeping) ØAccess to ORACLE through WWW server ØNeeds interfacing to event collections GAUDI M. Frank LHCb/CERN
HEP Computing GRID ØWhere does Gaudi end and GRID middleware start? ØHow do the two interface to each other ? Better: How does Gaudi interface to the GRID? ØData replication (tags + event data) ØLocation of closest copy of data ØHow does the I/O mechanism works in the presence of file replication (links, references…) GAUDI M. Frank LHCb/CERN
GRID: Alternative View Algorithms API Gaudi Domain Gaudi Services API Application external Services GAUDI Grid Domain M. Frank LHCb/CERN
GRID: Todo ØIdentify Gaudi components which interact with GRID middleware ØAll GRID related aspects must be worked out and finally addressed in collaboration with ØGRID experts ØThe TIERn computing responsible persons GAUDI M. Frank LHCb/CERN
Data Management TAG Ø Reaction to the outcome of the Hoffmann-review Ø There was a “Kick-off” Ø We expressed interest to collaborate for Ø Bookkeeping Ø Event tag collections Ø Conditions database Ø We believe that this activity is very useful Ø Identify areas of common interest Ø We are ready to participate GAUDI M. Frank LHCb/CERN
Conclusions ØSeveral sub-detector software components will soon be completely decoupled from ZEBRA ØStart to test Gaudi I/O mechanism to be ready when reconstruction program is fully implemented in C++ ØFor the beginning we aim only at few customers ØPrepare and test involved infrastructure GAUDI M. Frank LHCb/CERN
- Slides: 27