ROOT IO TTree Queries CHEP 2004 Ren Brun
ROOT I/O TTree Queries CHEP 2004 René Brun / CERN Philippe Canal / Fermilab Fons Rademakers / CERN September 29, 2004 Conference for Computing in High http: //root. cern. ch Energy and Nuclear Physics
Contents Ø Status l Overview l List of other presentations Ø ROOT I/O l Large Files l Double 32_t l Foreign objects l New interfaces l XML back-end l Historical recap. Ø TTree Query l Calling free standing functions l Rebinning l Support for Indexed Friends l Arbitrary C++ in queries (TTree: : Make. Proxy) Ø Containers Support l Mainly for STL containers l Splitting l TTree Query September 29, 2004 Ø TTree l Auto load of TRef’ed branches l User. Info l Clone. Tree Ø Support for SQL back-end Ø Future Plans Conference for Computing in High Energy and Nuclear Physics 2
Presentations and Posters Ø Ø Ø [328] The Next Generation Root File Server by Andrew ANUSHEVSKY (Theatersaal: Sept 27, 16: 30 - 16: 50) [412] XML I/O in ROOT by Sergey LINEV (Brunig 1 + 2: Sept 29, 15: 20 - 15: 40) [430] Global Distributed Parallel Analysis using PROOF and Ali. En by Fons RADEMAKERS (Theatersaal: Sept 29, 15: 20 - 15: 40) [104] Authentication/Security services in the ROOT framework by Gerardo GANIS (Brunig 3: Sept 29, 16: 50 - 17: 10) [169] Guidelines for Developing a Good GUI by Ilka ANTCHEVA (Brunig 1+2: Sept 30, 14: 00 - 14: 20) [287] Super scaling PROOF to very large clusters by Maarten BALLINTIJN (Ballsaal: Sept 30, 15: 00 - 15: 20) September 29, 2004 Ø Poster on September 29 Ø [128] XTNet. File, a fault tolerant extension of ROOT TNet. File client Ø Poster on September 30 Ø [298] The ROOT 3 -D graphics and geometry classes [170] The User Interface Design in ROOT [303] The ROOT Linear Algebra Package [98] RDBC: ROOT Data. Base Connectivity [99] Interactive Data Analysis with Carrot (ROOT Apache Module) Ø Ø Conference for Computing in High Energy and Nuclear Physics 3
Status Ø ROOT 4. 01/02 just released Ø Production Release of 4. 01 planned for December 2004 Ø Many improvements since CHEP 2003 Ø This talks: l I/O and TTree queries Ø For other developments, see the other ROOT related talks September 29, 2004 Ø XROOTD l A new generation ROOT file server Ø Authentication Overhaul Ø Object Property Editor l e. g. . TH 1 Editor, TH 2 Editor, TGraph. Editor Ø New classes for GUI Ø GUI builder Ø Brand new GL viewer Ø Math and Stats l New Matrix package Implementation l New functions in TMath (Now a namespace) l Quadratic programming Conference for Computing in High Energy and Nuclear Physics 4
TFile and TDirectory Ø Very Large Files l Support on all platforms for 64 bits integers via the portable typedefs Long 64_t and ULong 64_t. • Long long on Unix, _int 64 with VC++ l Support for File larger than 2 Gb added in ROOT 4. 00 • File smaller than 2 Gb still readable by older version of ROOT l Support for TTree with more than 2**31 entries Ø Double 32_t l Same as Double_t in memory l Same as Float_t on disk l Support automatic schema evolution to and from float and double l Warning: too many read/write cycle could result in some loss of precision September 29, 2004 Conference for Computing in High Energy and Nuclear Physics 5
XML output format Ø Update to the I/O classes to allow the customization of the backend. Ø Implemented for XML Ø Will be used for SQL support. Ø XML files allow the interchange of data with applications unable to read ROOT file directly Ø Example: TCanvas c; h. Draw(); c. Save. As("c. xml"); c. Save. As("c. root"); Ø Refer to Dr. Linev’s presentation Ø Extract from c. xml: <Xml. Key name="c 1" cycle="1"> <Object class="TCanvas"> <Version v="5"/> <TPad version="8"> <TVirtual. Pad version="2"> <TObject f. Unique. ID="0" f. Bits="3000008"/> <TAtt. Line version="1"> <f. Line. Color v="1"/> <f. Line. Style v="1"/> <f. Line. Width v="1"/> </TAtt. Line> <TAtt. Fill version="1"> <f. Fill. Color v="19"/> <f. Fill. Style v="1001"/> </TAtt. Fill> for more details September 29, 2004 Conference for Computing in High Energy and Nuclear Physics 6
ROOT I/O History Ø Version 2. 25 and older l l Only hand coded and generated streamer function, Schema evolution done by hand I/O requires : Class. Def, Class. Imp and CINT Dictionary Ø Version 2. 26 l l Automatic schema evolution Use TStreamer. Info (with info from dictionary) to drive a general I/O routine. Ø Version 3. 03/05 l l Lift need for Class. Def and Class. Imp for classes not inheriting from TObject Any non TObject class can be saved inside a TTree or as part of a TObject-class Ø Version 4. 00/00 l Automatic versioning of ‘Foreign’ classes Ø Version 4. 00/08 l Non TObject classes can be saved directly in TDirectory Ø Version 4. 01/02 l September 29, 2004 Large TTrees, TRef autoload Conference for Computing in High Energy and Nuclear Physics 7
Foreign Objects TBuffer. . Bytecount Ø To save non instrumented classes: l Need just the data dictionary l Default versioning provided by a Checksum based on the type and name of the persistent data members l Checksum stored as an additional 4 bytes (4 bytes) 0 (2 bytes) checksum (4 bytes) Object. N Bytecount 0 checksum object. N+1. . Ø Class. Def advantages l The Is. A function generated by Class. Def speeds up considerably the access to the TClass for a given object. l The version number (2 bytes maximum) consumes less space on disk than the “ 0+checksum” Ø New interface to store and retrieve object with Type Safety ptrclass *ptr = …; directory->Write. Object(ptr, "name"); ptrclass *ptr; directory->Get. Object("name", ptr); September 29, 2004 Conference for Computing in High Energy and Nuclear Physics 0 if object absent or of wrong type 8
TClones. Array Ø Optimization of the number of calls to new and deletes Ø Ability to split the collection of objects in a TTree l Improve compression and run-time Ø Ability to save object member-wise l Store the same data member of all the elements of the collections consecutively l Improve compression (buffer data more homogeneous) l Improve run-time (avoid n-1 tests of the data type) Ø Ability to use in TTree: : Draw as a collection Ø Ability to read back without the original compiled code September 29, 2004 Conference for Computing in High Energy and Nuclear Physics 9
Old STL Container Support For versions older than 4. 00/00 Ø Collection always stored object wise Ø Nesting of STL collections was extremely limited Ø No splitting was possible Ø STL containers stored using a generated function Ø One generated function per actual data member. Ø Compiled version of these functions required for writing and also for reading September 29, 2004 void R__User_f. List 1(TBuffer &R__b, void *R__p, int) { if (R__b. Is. Reading()) { vector<THit> &f. List 1 = *(vector<THit> *)R__p; int R__n; f. List 1. clear(); R__b >> R__n; R__stl. reserve(R__n); for (int R__i = 0; R__i < R__n; R__i++) { THit R__t; R__t. Streamer(R__b); f. List 1. push_back(R__t); } } else { … writing … } } Conference for Computing in High Energy and Nuclear Physics 10
New Container Support Ø New Abstract Interface: l TVirtual. Collection. Proxy l Can be implemented for almost any collections Ø Allows l Splitting (for collection of homogenous objects) l Use in Tree Query (with automatic looping) Ø Will allow l Member-wise streaming (as opposed to Object wise streaming) Ø Also l Arbitrary nesting of STL containers l Reading of STL containers without original code (Emulated mode) Ø Note: as of 4. 00/08 only std: : vector has Proxies. Ø Early Prototype and fundamental Concepts by Victor Perevoztchikov September 29, 2004 Conference for Computing in High Energy and Nuclear Physics 11
STL Support Ø Each STL container instance now has an associated TClass object Ø Why not rely on the Emulation Ø Several co-existing streaming implementations l Proxy l • An emulation proxy acting on “live STL object” requires a few tricks and assumptions Generated Streamer • For object-wise streaming • Fully respect custom allocators and comparators • Easier to implement and similar run-time cost as a templated solutions l l Emulation Proxy (e. g. . TEmulated. Vector. Proxy) • For reading without a compiled version • Allow to easy sharing of ALL ROOT files that have no custom streamers. September 29, 2004 memory footprint of the STL container object is (usually? ) independent from the template parameter List proxy would need a series of list of increasing fixed size content (aka. list<char[1024]>, list<char[2048]>) • Does not respect allocators and comparator Templated Proxy (e. g. . TVector. Proxy) • For splitting and member-wise streaming Fully respect custom allocators and comparators Implementation difficulties l l Templated proxy can be faster and more memory efficient. The emulation layer might actually be implemented using alternative collections (if we assume it does not have to deal with real objects) Conference for Computing in High Energy and Nuclear Physics 12
Container I/O Implementation Ø Any container can be summarized by the sequence of its content’s addresses Ø Use TVirtual. Collection: : At via TVirtual. Collection: : operator[] Ø Pros l I/O Code completely independent of the collection • Reduced code duplication in TStreamer. Info l No run-time cost for TClones. Array Ø Cons l Implementation for containers with no random access iterator needs to cache the iterator. Ø Member-wise implementation l Member-wise/object-wise choice will be encoded in the ‘version number’ of the STL collections l API will be provided to select member-wise or object-wise for data member that are STL collections September 29, 2004 Conference for Computing in High Energy and Nuclear Physics 13
TTree Ø TRef autoload l l Added (optional) support for the auto-loading of branches referenced by a TRef object. Generate one table of references to branches per entry TRef: : Get. Object uses this table to find and load the branch containing the referenced object To enable it call: . tree->Branch. Ref(); Ø TTree: : Get. User. Info l Used to store with the TTree any user defined object(s) that is not depending on the entry number l Examples: class Event { TClones. Array *f. Tracks; TRef f. Last. Track; }; branch=tree. Get. Branch("f. Last. Track"); branch->Get. Entry(7); tlast = event->Get. Last. Track(); • Luminosity, Calibrations etc. . tree. Get. User. Info()->Add(myobject); September 29, 2004 Conference for Computing in High Energy and Nuclear Physics 14
Copying a TTree Ø Very flexible simple copying tools allowing cut on: l Number of entries l Number of branches l Selection of entries base on a Formula l Useable for both TTree and TChain Ø Important simplification of the interface l Removed the requirement of explicitly setting the addresses for ALL the branches. 3 Branches tree->Set. Branch. Status(“br”, k. FALSE”); newtree=tree->Clone. Tree(); 2 Branches 3 Branches tree->Copy. Tree(“f. Tracks. f. Px<=1. 2”); September 29, 2004 Conference for Computing in High Energy and Nuclear Physics 15
TTree Queries Ø Implemented Boolean expression optimization (&& and ||) Ø Rebinning now possible from the TTree data (via new histogram editor) Ø Improved TTree: : Scan output (customization and array display) Ø Call to external functions: l Free standing function or class static member function l Compiled or interpreted with Numerical arguments and Numerical return type l Example: tree->Draw("TMath: : Prob(var, 5)"); September 29, 2004 Conference for Computing in High Energy and Nuclear Physics 16
TTree Queries Ø Support for Collection l TTree. Formula now treats any collection class which has a TVirtual. Collection in the exact same way as a TClones. Array l Automatically loops over the elements l Can access a specific element l Synchronized with other collections and arrays in the formulas User Tree Ø Connecting several TTrees l TChain adds more entries l TTree Friends adds (virtually) more branches l Prior to ROOT 4. 00/08 correlation between Friends made only by entry number Main Tree • This is a problem if Trees have semantically a different sequence of entries l Can now connect the Friend using an Index • For example Run Number/Event Number User Tree Indexed Main Tree run event 2 2 1 1 1 1 1 2 2 September 29, 2004 Conference for Computing in High Energy and Nuclear Physics 17
C++ TTree Queries Ø Allow execute of a user script in a scope where the branch names are symbols giving access to the TTree data l For example to draw px in hsimple. root just have a hsimple. C file: double hsimple() { return px; } l and do: tree->draw("hsimple. C"); // Generate Code, Compile, Link and load Ø This allows using arbitrary C++ to quickly plot histograms l Arbitrary looping l Arbitrary functions and methods calls (if library is present) l Loading of only the branches actually used Ø This feature is implemented using the selector generated by Make. Proxy September 29, 2004 Conference for Computing in High Energy and Nuclear Physics 18
Make. Proxy Ø Make. Proxy generates a TSelector l Will eventually replace Make. Class and Make. Selector Ø Features l Creates a C++ context where branch names (including periods) can be used as variable l On demand loading of branches l Respect/recreate the original class structure l Array bound check l Access to member functions (if code is loaded) Ø Example with Event. root: cout << f. Evt. Hdr. f. Evt. Num << endl; cout << f. Tracks->Get. Last()+1 << endl; cout << f. Tracks. f. Px[0] << endl; cout << f. Tracks. f. Px[1] << endl; cout << "f. Tracks. f. Nsp[0]: " << f. Tracks. f. Nsp[0] << endl; cout << "f. Tracks. f. Point. Value[0][0]: " << f. Tracks. f. Point. Value[0][0] << endl; cout << "f. Last. Track: " << f. Last. Track->Get. Unique. ID() << endl; September 29, 2004 Conference for Computing in High Energy and Nuclear Physics 19
File types & Access in 4. 01/xx user Local File X. xml http TTree. SQL TFile TKey/TTree TSQLServer TStreamer. Info TSQLRow TSQLResult rootd/xrootd Oracle Local File X. root September 29, 2004 Dcache Castor My. SQL Pg. SQL RFIO Chirp Conference for Computing in High Energy and Nuclear Physics Sap. Db 20
New RDBMS interface: Goals Ø Access any RDBMS tables from TTree: : Draw Ø Create a Tree in split mode creating a RDBMS table and filling it. Ø The table can be processed by SQL directly. Ø The interface uses the normal I/O engine, including support for Automatic Schema Evolution. September 29, 2004 Conference for Computing in High Energy and Nuclear Physics 21
New RDBMS Interface Ø Current prototype l l l Simple TTree (branch with leaf list) Implemented via TSQLxxx for reading and writing Implemented via RDBC for reading • See: http: //carrot. cern. ch/~onuchin/RDBC/ l Should be released in December 2004. Ø Should be expanded to support branch of objects l l Need to implement a way to store and retrieve TStreamer. Info(s) and TProcess. ID(s) in the database Will probably use SQL binary ‘blob’ to store non-split objects. September 29, 2004 Conference for Computing in High Energy and Nuclear Physics 22
RDBMS Examples TTree. SQL tree(const char *db, const char* uid, …) Connect to an existing db tree. Print(), Browse, Scan, etc tree. Draw(“var 1: var 2”, ”varx <0”) TTree query style converted to SQL Create the data base on server TTree. SQL tree(“mysql: //localhost/test”, ”nobody”, ”new”); Columns created using the normal split algorithm. Blobs created below split. Event *event = new Event; tree. Branch(“top”, ”Event”, &event); tree. Fill(); tree. Auto. Save(); A TSQLRow is filled and sent to the server September 29, 2004 Conference for Computing in High Energy and Nuclear Physics 23
Future Plans for I/O and TTree Ø Implement member-wise storing for std: : vector (late 2004) Ø Implement TVirtual. Collection. Proxy for each of the STL containers (late 2004, early 2005) Ø Add support for auto loading of TRef branches across trees Ø TChain, TTree Friends and Indexing l Ø Add support for “befriending” TChain objects using an Indexed relation TTree Queries l Allow following (transparently) TRef and TRef. Array September 29, 2004 Conference for Computing in High Energy and Nuclear Physics 24
Summary Ø TFile improvement l Large files and trees, Double 32_t, XML output format. l Support for non-instrumented classes Ø Enhancement in I/O and Tree Query for collection l Split Collections l Fast histograming of (potentially) any collections l Lift restrictions on STL I/O • Nested containers • Reading without compiled code Ø TTree l Remove stringent requirements on Clone. Tree l Add support for auto loading of referenced objects l Support for RDBMS databases back-end coming soon. Ø TTree Queries l Can call any functions taking numerical arguments l Can use arbitrary C++ and still use the branch names as variables l TTree Friend linked by Index September 29, 2004 Conference for Computing in High Energy and Nuclear Physics 25
- Slides: 25