PROOF integration in FAIRROOT Radoslaw Karabowicz GSI 12
PROOF integration in FAIRROOT Radoslaw Karabowicz GSI 12. 2011 XXIX PANDA Collaboration Meeting
What is PROOF? Grid. Ka 2011, ROOT and PROOF Tutorial PROOF stands for Parallel ROOt Facility. It allows parallel processing of large amount of data. The output results can be directly visualized (e. g. the output histogram can be drawn at the end of the proof session). PROOF is NOT a batch system. The data which you process with PROOF can reside on your computer, PROOF cluster disks or grid. The usage of PROOF is transparent: you should not rewrite your code you are running locally on your computer. No special installation of PROOF software is necessary to execute your code: PROOF is included in ROOT distribution. PROOF in FAIRROOT Radoslaw Karabowicz, GSI XXXIX PANDA CM 12. 2011
How does PROOF work? Grid. Ka 2011, ROOT and PROOF Tutorial root analysis code, data stdout, results root node 1 results analysi s code data results root data node 2 results root data node 3 results PROOF Master PROOF Slave PROOF in FAIRROOT Radoslaw Karabowicz, GSI XXXIX PANDA CM 12. 2011 root node 4 data
Trivial parallelism Grid. Ka 2011, ROOT and PROOF Tutorial Sequential processing Unordered processing data Event Event Event 1 2 3 4 5 6 7 8 9 10 11 12 Event Event Event = Parallel processing data 1 3 2 1 4 5 8 9 7 6 12 11 10 = Event results 1 data 2 1 2 3 4 Event + results 2 Σ results PROOF in FAIRROOT = results Radoslaw Karabowicz, GSI XXXIX PANDA CM = 12. 2011 results data 3 5 6 7 8 Event + 9 10 11 12 results 3
PROOF terminology Grid. Ka 2011, ROOT and PROOF Tutorial The following terms are used in PROOF: PROOF cluster Set of machines communicating with PROOF protocol. One of those machines is normally designated as Master (multi-Master setup is possible as well). The rest of machines are Workers. Client Your machine running a ROOT session that is connected to a PROOF master. Master Dedicated node in PROOF cluster that is in charge of assigning workers the chunks of data to be processed, collecting and merging the output and sending it to the Client. Slave/Worker A node in PROOF cluster that processes data. Query A job submitted from the Client to the PROOF cluster. A query consists of a selector and a chain. Selector PROOF in FAIRROOT Radoslaw Karabowicz, GSI XXXIX PANDA CM 12. 2011
PROOF basics • Easy to start (24 keyboard strokes): karabowi@kp 3 mac 001: : ~$ root -l root [0] TProof: : Open("") +++ Starting PROOF-Lite with 2 workers +++ Opening connections to workers: OK (2 workers) Setting up worker servers: OK (2 workers) PROOF set to parallel mode (2 workers) (class TProof*)0 x 10187 fc 00 root [1] • Easy to use (process selector on a chain): root [1] TChain* my. Chain = new TChain("cbmsim") root [2] my. Chain->Add. File("my. File. root") (Int_t)1 root [3] my. Chain->Set. Proof() root [4] my. Chain->Process("My. Selector. C") PROOF in FAIRROOT Radoslaw Karabowicz, GSI XXXIX PANDA CM 12. 2011
PROOF basics • User needs to develop a selector: karabowi@kp 3 mac 001: : ~$ root -l root [0] TChain* my. Chain = new TChain("cbmsim") root [1] my. Chain->Add. File("my. File. root") (Int_t)1 root [2] my. Chain->Make. Selector("My. Selector") Warning in <TClass: : TClass>: no dictionary for class Pnd. MCTrack is available Warning in <TClass: : TClass>: no dictionary for class Pnd. Sds. MCPoint is available Warning in <TClass: : TClass>: no dictionary for class Fair. Base. Point is available Warning in <TClass: : TClass>: no dictionary for class Fair. Time. Stamp is available Warning in <TClass: : TClass>: no dictionary for class Fair. Multi. Linked. Data is available Warning in <TClass: : TClass>: no dictionary for class Fair. Linked. Data is available Warning in <TClass: : TClass>: no dictionary for class Pnd. Stt. Point is available Warning in <TClass: : TClass>: no dictionary for class Pnd. Gem. MCPoint is available Warning in <TClass: : TClass>: no dictionary for class Pnd. Tof. Point is available Warning in <TClass: : TClass>: no dictionary for class Fair. MCEvent. Header is available Warning in <TClass: : TClass>: no dictionary for class Fair. File. Header is available Info in <TTree. Player: : Make. Class>: Files: My. Selector. h and My. Selector. C generated from TTree: cbmsim (Int_t)0 root [4] PROOF in FAIRROOT Radoslaw Karabowicz, GSI XXXIX PANDA CM 12. 2011
PROOF basics • My. Selector. h contains full list of the TTree branches, and few functions that can be filled by the user: virtual void Begin(TTree *tree); // executed on master at the beginning virtual void Slave. Begin(TTree *tree); // executed on each worker node at the beginning virtual void Init(TTree *tree); // executed on a worker when getting new tree virtual Bool_t Notify(); virtual Bool_t Process(Long 64_t entry); // executed for event “entry” in the tree virtual Int_t Get. Entry(Long 64_t entry, Int_t getall = 0) { return f. Chain ? f. Chain->Get. Tree()->Get. Entry(entry, getall) : 0; } virtual void Set. Option(const char *option) { f. Option = option; } virtual void Set. Object(TObject *obj) { f. Object = obj; } virtual void Set. Input. List(TList *input) { f. Input = input; } virtual TList *Get. Output. List() const { return f. Output; } virtual void Slave. Terminate(); // executed on each worker node at the end virtual void Terminate(); // executed on master at the end • Input and output controlled via TLists: TList* f. Input; // list of objects available during processing TSelector. List* f. Output // list of objects created during processing PROOF in FAIRROOT Radoslaw Karabowicz, GSI XXXIX PANDA CM 12. 2011
PROOF in FAIRROOT GOALS: • run FAIRROOT analysis on PROOF cluster • restrict the changes to fairbase, i. e. • reduce the changes in users’ analysis code & • reduce the changes in users’ macros PROOF in FAIRROOT Radoslaw Karabowicz, GSI XXXIX PANDA CM 12. 2011
PROOF in FAIRROOT STEPS: • load FAIRROOT libraries on the workers • develop general selector • change fairbase et al PROOF in FAIRROOT Radoslaw Karabowicz, GSI XXXIX PANDA CM 12. 2011
PROOF Archive • gtar’red directory containing SETUP. C and optionally (PAR) BUILD. sh. These scripts will be executed on each worker node • • • GOAL: have to load FAIRROOT libraries on each worker node IMPORTANT: need a simple way to get list of needed libraries; this solution has to be general for each experiment using FAIRROOT and has to require minimum users’ intervention SOLUTION: current implementation of lib. Fair. Root. par contains only SETUP. C which loads and executes gconfig/rootlogon. C PROOF in FAIRROOT Radoslaw Karabowicz, GSI XXXIX PANDA CM 12. 2011
Fair. Ana. Selector • • • The class deriving from TSelector with well defined member functions that are executed in specific order. Usually used as my. Chain->Process(My. Selector); either locally or on PROOF GOAL: send a Fair. Run. Ana with the list of tasks, parameters, geometry, etc. to the workers, analyze the chain, collect workers’ outputs and merge the outputs CHALLENGES: to send objects to the workers via TList* f. Input. List the objects have to be “streamable” PROOF in FAIRROOT Radoslaw Karabowicz, GSI XXXIX PANDA CM 12. 2011
Fair. Ana. Selector • PROOF in FAIRROOT “Streamable”? = ~simple~ • • no instantons derive My. Class from TObject public default constructor My. Class(); initialize all members to 0 Radoslaw Karabowicz, GSI XXXIX PANDA CM 12. 2011
Fair. Ana. Selector • SOLUTION: master Fair. Run. Ana opens proof session, adds output. File. Name, parameter. File. Names, f. Task to f. Input, uploads. par package and runs Fair. Ana. Selector on the input chain: TProof* proof = TProof: : Open(f. Proof. Server. Name. Data()); proof->Add. Input(new TNamed("FAIRRUNANA_f. Output. File. Name", out. File. Data())); proof->Add. Input(new TNamed("FAIRRUNANA_f. Par. Input 1 FName", par 1 File. Data())); proof->Add. Input(new TNamed("FAIRRUNANA_f. Par. Input 2 FName", par 2 File. Data())); proof->Add. Input(f. Task); proof->Upload. Package(f. Proof. Par. Name. Data()); proof->Enable. Package(f. Proof. Par. Name. Data()); in. Chain->Set. Proof(); PROOF in FAIRROOT Radoslaw Karabowicz, GSI in. Chain->Process("Fair. Ana. Selector", "", NEntries, NStart); XXXIX PANDA CM 12. 2011
Fair. Ana. Selector • • PROOF in FAIRROOT SOLUTION: Fair. Ana. Selector creates Fair. Run. Ana on each worker node at the begin of the job, it asks this Fair. Run. Ana to analyze individual events in the Process() function, the Fair. Run. Ana is finished function and its output is stored in TSelector. List* f. Output in the Slave. Terminate() function. OPTIONAL: The default ROOT’s file/tree merger may be used to merge the workers’ output. It is also possible to store the individual workers’ outputs. Radoslaw Karabowicz, GSI XXXIX PANDA CM 12. 2011
my. Macro{ //running locally Fair. Run. Ana* f. Run; f. Run->Set. Input. File(); f. Run->Set. Output. File(); Fair. Runtime. Db* rtdb = f. Run->Get. Runtime. Db(); rtdb->Set. First. Input(); rtdb->Set. Second. Input(); PROOF in FAIRROOT my. Macro { // running on PROOF Fair. Run. Ana* f. Run; f. Run->Set. Input. File(); f. Run->Set. Output. File(); Fair. Runtime. Db* rtdb = f. Run->Get. Runtime. Db(); rtdb->Set. First. Input(); rtdb->Set. Second. Input(); f. Run->Add. Tasks(); f. Run->Init(); f. Run->Run(first. Event, last. Event); } } void Fair. Run. Ana: : Run (Int_t NStart, Int_t NStop) { for(Int_t iev=NStart; iev<NStop; iev++) { f. Task->Execute. Task() } } PROOF in FAIRROOT f. Run->Add. Tasks(); f. Run->Init(); f. Run->Run(first. Event, last. Event, ”proof”); Fair. Runtime. Db* rtdb = f. Run->Get. Runtime. Db(); rtdb->Set. First. Input( f. Input->Find. Object(par 1 File. Name)); rtdb->Set. Second. Input( f. Input->Find. Object(par 2 File. Name)); f. Run->Add. Tasks (f. Input->Find. Object("Fair. Task. List")); void Fair. Run. Ana: : Run (Int_t NStart, Int_t NStop, const char* type) { TProof* proof = TProof: : Open(“”); proof->Add. Input(out. File. Name. Data()); proof>Add. Input(par 1 File. Name. Data())); proof->Add. Input(par 2 File. Name. Data())); proof->Add. Input(f. Task); proof->Upload. Package(“lib. Fair. Root. par”); proof>Enable. Package(“lib. Fair. Root. par”); in. Chain->Set. Proof(); in. Chain->Process("Fair. Ana. Selector", "", NEntries, NStart); } Radoslaw Karabowicz, GSI Fair. Ana. Selector: : Init(TTree* tree) { if ( !f. Run. Ana ) { f. Run. Ana = new Fair. Run. Ana(); f. Run->Set. In. Tree(tree); f. Run->Set. Output. File( f. Input->Find. Object(out. File. Name)); XXXIX PANDA CM 12. 2011 f. Run->Init(); } else { f. Run. Ana->Set. In. Tree(tree); Fair. Root. Manager* ioman = Fair. Root. Manager: : Instance(); ioman->Open. In. Tree(); } } Fair. Ana. Selector: : Process(Long 64_t entry){ f. Run. Ana->Run. Entry(entry); }
fairbase et al • On these slides several most important currently implemented changes to FAIRROOT will be summarized: karabowi@. . . /trunk/parbase$ cd. . /gem karabowi@. . . /trunk/parbase/gem$ svn diff | wc -l 349 karabowi@. . . /trunk/parbase/gem$ cd. . /mvd karabowi@. . . /trunk/parbase/mvd$ svn diff | wc -l 231 karabowi@. . . /trunk/parbase/mvd$ cd. . /sds karabowi@. . . /trunk/parbase/sds$ svn diff | wc -l 736 karabowi@. . . /trunk/parbase/sds$ cd. . /stt karabowi@. . . /trunk/parbase/stt$ svn diff | wc -l 81 karabowi@. . . /trunk/parbase/stt$ cd. . /globalkarabowi@. . . /trunk/parbase/global$ svn diff | wc -l karabowi@lxi 012: : ~/pandaroot_13510/trunk/base$ svn status ? Fair. Ana. Selector. cxx ? Fair. Ana. Selector. h M Fair. Run. cxx M Fair. Root. Manager. cxx M Fair. Run. h M Fair. Task. cxx M Fair. Root. Manager. h M CMake. Lists. txt M Fair. Run. Info. cxx M Fair. Run. Ana. cxx M Fair. Task. h M Fair. Link. Def. h M Fair. Run. Ana. h karabowi@lxi 012: : ~/pandaroot_13510/trunk/base$ svn diff | wc -l 1714 karabowi@lxi 012: : ~/pandaroot_13510/trunk/base$ svn status. M Fair. Par. Root. File. Io. cxx. M Fair. Par. Ascii. File. Io. cxx. M Fair. Par. Io. h. M Fair. Par. Ascii. File. Io. h. M Fair. Par. Io. cxx. M Fair. Det. Par. Root. File. Io. cxxkarabowi@lxi 012: : ~/pandaroot_13510/trunk/base$ svn diff | wc l 101 PROOF in FAIRROOT Radoslaw Karabowicz, GSI XXXIX PANDA CM 12. 2011 407
fairbase et al Fair. Run. Ana (only most important mentioned): • new member: • TSelector* f. Selector; • new functions: • • • PROOF in FAIRROOT void Run(Int_t NStart =0, Int_t NStop=0, const char *type); void Run. Entry(Int_t entry. No); void Set. In. Chain(TChain* temp. Chain); void Set. In. Tree (TTree* temp. Tree); TTree* Get. Out. Tree(); Radoslaw Karabowicz, GSI XXXIX PANDA CM 12. 2011
fairbase et al Fair. Root. Manager (only most important mentioned): • new member: • TTree* f. In. Tree; • new functions: • • • PROOF in FAIRROOT void Set. In. Tree (TTree* temp. Tree); void Set. In. Chain(TChain* temp. Chain); Bool_t Open. In. Tree(); TObject* Get. Object. From. In. Tree(const char* Br. Name); TObject* Activate. Branch. In. Tree(const char* Br. Name); Radoslaw Karabowicz, GSI XXXIX PANDA CM 12. 2011
fairbase et al My. Task (only most important mentioned): • • PROOF in FAIRROOT initialize all possible members to 0 in default constructor My. Task(); initialize all possible members to 0 in default constructor My. Class() of class My. Class, which is a member of My. Task, pointers to instantons as members are difficult to stream do not ->Delete() empty pointers, protect with if: if ( my. Pointer ) my. Pointer->Delete(); Radoslaw Karabowicz, GSI XXXIX PANDA CM 12. 2011
Running PROOF • The simplest way to use PROOF is the PROOF-Lite, which uses your own local machine CPUs: karabowi@kp 3 mac 001: : ~$ root -l root [0] TProof: : Open("") +++ Starting PROOF-Lite with 2 workers +++ Opening connections to workers: OK (2 workers) Setting up worker servers: OK (2 workers) PROOF set to parallel mode (2 workers) (class TProof*)0 x 10187 fc 00 root [1] • For creating a PROOF cluster that uses external CPUs one may use Po. D (PROOF-on-Demand: http: //pod. gsi. de) PROOF in FAIRROOT Radoslaw Karabowicz, GSI XXXIX PANDA CM 12. 2011
Po. D & PROOF • Here a SSH plugin is used to connect to the workers konglaide@kp 3 mac 001: pod-server start. Starting Po. D server. . . updating xproofd configuration file. . . starting xproofd. . . starting Po. D agent. . . preparing Po. D worker package. . . selecting pre-compiled bins to be added to worker package. . . Po. D worker package will be repacked because "/Users/konglaide/. Po. D/etc/xpd. cf" was updated. Po. D worker package: /Users/konglaide/. Po. D/wrk/pod-worker------------XPROOFD [66174] port: 21001 Po. D agent [66179] port: 22001 PROOF connection string: konglaide@kp 3 mac 001. gsi. de: 21001 ------------konglaide@kp 3 mac 001: : ~$ pod-ssh -c ~/Po. D/pod_ssh. cfg --submit** Po. D jobs have been submitted. Use "podssh --status" to check the status. konglaide@kp 3 mac 001: : ~$ pod-ssh -c ~/Po. D/pod_ssh. cfg --status. Po. D worker "etch 64_16": RUNPo. D worker "etch 64_21": RUNPo. D worker "etch 64_20": RUNkonglaide@kp 3 mac 001: : ~$ root lroot [0] TProof: : Open(g. System->Get. From. Pipe("pod-info -c"))Starting master: opening connection. . . Starting master: OK Opening connections to workers: OK (20 workers) Setting up worker servers: OK (20 workers) PROOF set to parallel mode (20 workers)(class TProof*)0 x 1018 a 9 e 00 root [1] • Other plugins developed are: g. Lite, LSF, PBS, Grid Engine, Condor PROOF in FAIRROOT Radoslaw Karabowicz, GSI XXXIX PANDA CM 12. 2011
Results • Time performance • Data quality • Data integrity PROOF in FAIRROOT Radoslaw Karabowicz, GSI XXXIX PANDA CM 12. 2011
Time performance • Some remarks: • one will (almost) never get ideal scaling, so that n workers does not mean n time faster job execution, due to multiple initialization, library loading, PROOF overhead • the IO limits the time performance PROOF in FAIRROOT Radoslaw Karabowicz, GSI XXXIX PANDA CM 12. 2011
Time performance • • • PROOF-Lite with 4 workers One task: Pnd. Gem. Find. Hits speedup factor = local analysis time/proof analysis time • • • PROOF in FAIRROOT Radoslaw Karabowicz, GSI XXXIX PANDA CM 12. 2011 Tasks: Pnd. Mvd. Digi. Task, Pnd. Mvd. Cluster. Task, Pnd. Stt. Hit. Producer. Ideal, Pnd. Gem. Digitize, Pnd. Gem. Find. Hits, Pnd. Barrel. Track. Finder green: n. Workers = 4 blue: n. Workers = 2
Time performance PROOF on external CPUs Using Po. D with SSH plugin, lxi 020 (4 CPUs)+ lxi 016 (8 CPUs)+ lxi 021 (8 CPUs) PROOF in FAIRROOT Radoslaw Karabowicz, GSI XXXIX PANDA CM 12. 2011
Data quality • The result of the locally running FAIRROOT and the one running of PROOF are identical* locally 167145 entries worker 0 83709 entries locally 301185 entries worker 0 150301 entries worker 1 150884 entries * worker 1 83412 entries sum 167121 entries - down to f. Random - the order of event processing is different locally and on PROOF in FAIRROOT Radoslaw Karabowicz, GSI XXXIX PANDA CM 12. 2011
Data integrity • The PROOF divides automatically the input data into chunks and distributes them among workers • Extreme example: event distribution among PROOF workers: PROOF in FAIRROOT Radoslaw Karabowicz, GSI XXXIX PANDA CM 12. 2011
• • • Data integrity Event order is mixed in the output file Extreme example: output vs input event order Extreme in a sense, that the mixing is on an event level. When more input files than worker nodes, the PROOF sends whole files to workers PROOF in FAIRROOT Radoslaw Karabowicz, GSI XXXIX PANDA CM 12. 2011
Remarks • FAIRROOT has been adopted to run on a PROOF cluster • Tests results are promising • Further work is still required • The code is in the development branch and will be available in the trunk soon PROOF in FAIRROOT Radoslaw Karabowicz, GSI XXXIX PANDA CM 12. 2011
Backup slides PROOF in FAIRROOT Radoslaw Karabowicz, GSI XXXIX PANDA CM 12. 2011
Detailed processing time PROOF in FAIRROOT Radoslaw Karabowicz, GSI XXXIX PANDA CM 12. 2011
- Slides: 32