PROOF Parallel ROOT Facility Maarten Ballintijn http root
PROOF - Parallel ROOT Facility Maarten Ballintijn http: //root. cern. ch Bring the KB to the PB not the PB to the KB March, 2003 1
PROOF Intro n Collaboration between core ROOT group at CERN and MIT Heavy Ion Group n n n Rene Brun Fons Rademakers n n Gunther Roland Maarten Ballintijn Part of and based on ROOT framework ROOT since 1995, PROOF started 2001 A wealth of info at http: //root. cern. ch In ROOT CVS tree, beta tests ongoing March, 2003 2
PROOF Intro n Collection of servers processes data n n CPU Allocation and Data Access Strategies n n n Dynamic resource allocation Local data first, also rootd, SAN/NAS Transparency n n Parallel I/O and Parallel CPU Single source Analysis code Input Objects copied from Client Output Objects merged, returned to Client Scalability and Adaptability n Dynamic packet size March, 2003 3
PROOF Intro Slave many slaves Internet ROOT Master Client Session March, 2003 4
Phobos Event and An. T Tree 1 TTree: Paddle TPh. An. TEvent. Info Event 0. . n Track Vertex 0. . n Hit 1. . n March, 2003 TClones. Array TPh. An. TVertex TClones. Array TPh. An. TTrack TClones. Array TPh. An. THit 5
PROOF Packages n n n Provide a collection of files in the sandbox Binary or Source packages PAR files: Proof ARchive. Like Java jar n n Tar file, ROOT-INF directory BUILD. sh SETUP. C, per slave setting API manage and activate packages March, 2003 6
An. T Package ant: TPh. An. TPdl. Info. cxx ant/PROOF-INF: PROOF-INF/ TPh. An. TPdl. Info. h BUILD. sh Makefile TPh. An. TTrack. cxx SETUP. C Link. Def. h TPh. An. TTrack. h TPh. An. TEvent. Info. cxx TPh. An. TVertex. cxx TPh. An. TEvent. Info. h TPh. An. TVertex. h TPh. An. THit. cxx TPh. An. THit. h #!/bin/sh // SETUP. C -- Load An. T library # BUILD. sh -- Build libant. so { g. System->Load("lib. Physics. so"); exec make g. System->Load("libant. so"); } March, 2003 7
Analysis using TSelector n Extend Framework by inheritance // Abbreviated version class TSelector : public TObject { Protected: TList *f. Input; TList *f. Output; public void Init( TTree* ); void Begin( Ttree* ); Bool_t Process(int entry); void Terminate(); }; March, 2003 8
Analysis using TSelector n n Create Class inheriting from Tselector Implement member functions n n Begin() – Called once at the beginning of an analysis job, in each of the slave servers. Used to e. g. create histograms, initialize data Process()- Called for each entry to be processed (by that slave) Terminate()- Called once at the end of an analysis job, in each of the slave servers. Used to e. g. for post processing data, cleanup Init() – Called for each new file March, 2003 9
Example Selector antsel. C Antsel: : Begin(Ttree *) { f. Vtx_x = new TH 1 F(“vtxx”, “Vertex X”, 100, -10. , 10. ); } Antsel: : Process(int entry) { f. Chain->Get. Tree()->Get. Entry(entry); if ( evt. Info->f. Pdl. Mean < 1500 ) return; TPh. An. TVertex *v = evt. Info->f. RMSSelvtx->Get. Object(); f. Vtx_x->Fill( v->f. Pos. X() ); } Antsel: : Terminate() { f. Output->Add(f. Vtx_x); } March, 2003 10
Running locally n Develop and debug selector locally on small event sample. % root Root[0] TFile *f = Tfile: : Open(“ant_sample. root”) Root[1] TTree *t = (Ttree*) f->Get(“trk. Tree”) Root[3] t->Process(“antsel. C”, ””, 2000) Real time 0: 06, CP time 5. 940 Root[4] vtxx->Draw() Root[5]. ! vi antsel. C About 8 Mb data (~x 5 compression) n. Develop until ready for large sample. n March, 2003 11
Running Locally n Ready to run on a large sample March, 2003 12
TDSet – Specify the data n Specify a collection of TTrees or TFiles with objects [] TDSet *d = new TDSet(“TTree”, “tracks”, “/”); [] TDSet *d = new TDSet(“TEvent”, “/objs”); [] d->Add(“root: //rcrs 4001/a. root”, “tracks”, “dir”, first, num); … [] d->Print(“a”); n n To be returned by DB or File Catalog query etc. Use logical filenames (“lfn: …”) March, 2003 13
Running with PROOF n Ready to run on large event sample % root Root[0] g. ROOT->Proof(“pgate. lns. mit. edu”) … login details … Root[1] TDSet *ds = make_dset() Root[2] g. Proof->Upload. Package(“ant. par”) Root[3] g. Proof->Enable. Package(“ant”) … Root[4] g. Proof->Process(“antsel. C”, ””, 60000) Real time 0: 00: 12, CP Time 0. 050 Root[5] ((TH 1 F*)g. Proof->Get. Output(“vtxx”))->Draw() Use same session to look at other histograms, change cuts etc. n March, 2003 14
The PROOF advantage n Processed 240 Mb in 12 sec. March, 2003 15
PROOF Scalability 8. 8 GB, 128 files 1 node: 5: 25 m 32 nodes in parallel: 12 s 32 nodes: dual Itanium II 1 GHz CPU’s, 2 GB RAM, 2 x 75 GB 15 K SCSI disk, 1 Fast Eth, 1 GB Eth nic (not used) Each node has one copy of the data set (4 files, total of 277 MB), 32 nodes: 8. 8 Gbyte in 128 files, 9 million events March, 2003 16
Future Work n Ongoing development n n n Improvements and defect fixes Event lists Friend Tree Multi site PROOF sessions Continued development of GRID based PROOF cluster March, 2003 17
Other PROOF Talks n Fons Rademakers: n n Distributed Parallel Analysis Framework with PROOF (15: 00, session 2) Jinghua Liu: n Analysis of CMS Heavy Ion Simulation Data Using ROOT/PROOF/Grid March, 2003 18
- Slides: 18