EDELWEISS data structure and analysis framework Benjamin Schmidt
EDELWEISS data structure and analysis framework Benjamin Schmidt, June 2014 at MPI München Photo by Böhringer Friedrich KIT – Universität des Landes Baden-Württemberg und nationales Forschungszentrum in der Helmholtz-Gemeinschaft www. kit. edu
Motivation to build a new data structure and analysis framework (Kdata) Task: Get the data m on i t a ent k lac J. Cham o cu o d f Era Root based, but difficult access, no server with most recent code/data… Saclay Ana Fortran, Paw and C, No paw support, French comments in code/data… We had: Edw-II data analysis dispersed between Ana and Era 2 experts (full time analysis) Each with their own code single(few local)-user / single-programmer 2010 A. Cox and I struggling to find, to access and to analyze Edw 2 data Coincidence (Muon-Veto/Bolometer) study as diploma work 2 Benjamin Schmidt June 2014, CRESST/EDELWEISS/EURECA software workshop
Motivation to build a new data structure and analysis framework (Kdata) Short term facilitate data access Build flexible event based data structure Single combined HLA-file: muon-veto and bolometer data Make code and data easily available Documentation Long term establish a common collaboration-wide analysis and data storage tool Share tasks (calibration, template creation, …) / Remove barriers (documentation) Allow for upgrade to 100’s of detectors – develop automatic processing scheme 3 Benjamin Schmidt June 2014, CRESST/EDELWEISS/EURECA software workshop
The general picture – The idea All software modules DAQ KSamba KDS data structure Kamping KPTA pulse trace analysis A bit special: Standalone code Extensive use of templates Raw Amp HLA amp. To. HLA Analysis: KData. Py KQPA 4 Benjamin Schmidt June 2014, CRESST/EDELWEISS/EURECA software workshop
Specific known - unknown requirements during Kdata development Requirements Edw-3: 10 -> 40 detectors Larger workload for debugging, calibration and analysis New detector design (channel number/specifics initially unknown) New electronics (some specifics unknown) 1 st time resolved ionization signals (trace length? , num traces? ) Change in analog amplifiers -> signal shape? , trace length? , sampling? new efforts to optimize signal treatment needed Integrate muon-veto in bolo DAQ 5 Benjamin Schmidt June 2014, CRESST/EDELWEISS/EURECA software workshop
Event based data sorage Kdata - implementation The idea: Build a data storage and analysis framework for event-based physics data use ROOT Fast I/O Support for LHC lifetime Data compression Statistics tools Well known C++ class library for data encapsulation Keep it modular Keep it flexible and general Try to keep it simple Keep fully split tree (library independent) https: //edwdev-ik. fzk. de/SVN_Repository_for_the_KIT_Dark_Matter_Group/KData. html Document it Make it easily accessible 6 Benjamin Schmidt repository June 2014, CRESST/EDELWEISS/EURECA software workshop
Kdata event structure in detail Use ROOT types No nested arrays Kdata library not needed to read data Long livety of data guaranteed Kdata coded consistent to ROOT and taligent coding style: Easier to read/collaborate/check code For example: classes defined in header. h; implemented in. cxx variables start with small f (f. Channel. Name; f. Amp; f. Extra; …) functions start with capital letter Get. Channel. Name(); Get. Trace(); … Kds completely implemented with Get…() and Set…() methods Tab completion (ipython, root session) 7 Benjamin Schmidt June 2014, CRESST/EDELWEISS/EURECA software workshop
Kdata event structure in detail ROOT TTree with single event branch Event with flexible structure: Variable sized TClones. Arrays for Bolometer-, Bolo. Pulse-, Pulse. Analysis-, Samba- and Muon. Module information Allows to change in hardware number of bolos/number of channels per bolo… without code change in “kds” (data structure source code)! Requires some effort to get to know, though 8 Benjamin Schmidt June 2014, CRESST/EDELWEISS/EURECA software workshop
Kdata event structure Logic Layout: Logic event structure via TRef and TRef. Array Very powerful – can be spread over files, …. TTree KEvent KBolometer. Record KBolo. Pulse. Record = Channel KPulse. Analysis. Record KSamba. Record KMuon. Module. Records 9 Benjamin Schmidt A word of caution though: Require specific handling in event building: Never forget to reset the referenced object count TProcess. ID: : Set. Object. Count ->blowing up file size otherwise Probably most bugs and pbs in kds were related to TRef issues June 2014, CRESST/EDELWEISS/EURECA software workshop
Kdata event structure Logic Layout: Looping in python: for event in filereader: for bolo in event. bolo. Records(): for pulse in bolo. pulse. Records(): for analyis in pulse. analysis. Records(): TTree KEvent KBolometer. Record Looping C++ style in python: KBolo. Pulse. Record = Channel KPulse. Analysis. Record Bandpass analysis KPulse. Analysis. Record Optimal filter KPulse. Analysis. Record Trapezoidal filter … KSamba. Record for i in range(f. Get. Entries()): f. Get. Entry(i) event = f. Get. Event() for ii in range(event. Get. Num. Bolos()): bolo = event. Get. Bolo(ii) samba = bolo. Get. Samba. Record() print samba. Get. Ntp. Date. Sec() for iii in range(bolo. Get. Num. Pulse. Records()): pulse = bolo. Get. Pulse. Record(iii) Trace = pulse. Get. Trace() … KMuon. Module. Records 10 Benjamin Schmidt June 2014, CRESST/EDELWEISS/EURECA software workshop
Kdata event structure in detail Structure subclassed in Raw: KRaw. Event, KRaw. Bolometer. Record, … Amp: KAmp. Event, KAmp. Bolometer. Record, …. HLA: KHLAEvent, KHLABolometer. Record, … ~ 1/2 samba file size < 1/10 raw file size Amp and HLA – no pulse traces, but KPulse. Analysis. Record Raw – with pulse traces! No KPulse. Analysis. Records With a quick calculation 2. 87* 356/1850 *2. 35 FWHM 1. 04 ke. V Ana 1. 1 ke. V 11 Benjamin Schmidt June 2014, CRESST/EDELWEISS/EURECA software workshop
Python and KData. Py 12 Benjamin Schmidt June 2014, CRESST/EDELWEISS/EURECA software workshop
simple. Event. Viewer output: 13 Benjamin Schmidt June 2014, CRESST/EDELWEISS/EURECA software workshop
Looping utilites – no need to write the looping/plotting Use KData. Py. util with plotpulse(), loopbolo() and KData. Py. loop_amp with loopchannel(), plotchan_x_files(), plotchan_x_dir() Loop_amp to be completed with plotchannel_xy(), … and loop/plotbolo functions – Note that KData. Py. util loopbolo() also works for Amp and HLA data Basic usage: import ROOT import KData. Py. util as ut ut. plotpulse(“/sps/edelweis/kdata/raw/nk 23 b 002_000. root”, “chal. B FID 823”) Documentation 14 Benjamin Schmidt June 2014, CRESST/EDELWEISS/EURECA software workshop
Our data acquisition chains revisited Our look up place Modane Samba Macs Radon Automated proc 0: copy to Lyon proc 1: rootification proc 2: raw->amp Bolo-Raw proc 3: amp->hla proc 4: merge/skim data muon/hla bolo data sps. To. Hpss: backup on tape drive Muon Veto DAQ 15 Benjamin Schmidt Lyon Kdata - ROOT on kalinka Karlsruhe June 2014, CRESST/EDELWEISS/EURECA software workshop
Using the Kdata pulse processing library Adam Cox our benevolent dictator for life 16 Benjamin Schmidt June 2014, CRESST/EDELWEISS/EURECA software workshop
The KPulse. Analysis. Chain The kpta-chain is applied before your analysis function 17 Benjamin Schmidt June 2014, CRESST/EDELWEISS/EURECA software workshop
Ionisation channel after pattern removal: 18 Benjamin Schmidt June 2014, CRESST/EDELWEISS/EURECA software workshop
Advantages – Drawbacks (personal opinion) Flexibility of data structure Consistency of data structure (over time) Same data structure for different detector systems -> Great for coincidence studies Same data structure for different processing/analyses (bandpass, optimal filter, …) Decouple high level analyses from DAQ/processing changes Flexibility of data structure comes with some complexity (heavyness) Especially Ttree. Draw() more complex Single raw data folder restricted use of ls Writing kpta with templates a bit more complex Independent kpta library Has been reused with (flat) data from EURECA test stand Very versatile 19 Benjamin Schmidt June 2014, CRESST/EDELWEISS/EURECA software workshop
Usage of pyhton 90 % of the time python feels like the right solution Shorter, more legible code Vast set of external libraries Extremely handy for scripting Basic Documentation in python always via ‘’’docstrings’’’ 20 Benjamin Schmidt Main price – speed: Circumvent by producing an additional set of data files skimmed by detector Future use of pypy + ROOT 6 June 2014, CRESST/EDELWEISS/EURECA software workshop
But 50 x slower Py. PY-JIT compile 1. 06 x slower 21 Benjamin Schmidt June 2014, CRESST/EDELWEISS/EURECA software workshop
22 Benjamin Schmidt June 2014, CRESST/EDELWEISS/EURECA software workshop
- Slides: 22