in ALICE March 22 2013 ALICE Offline Tutorial

  • Slides: 24
Download presentation
in ALICE March 22, 2013 ALICE Offline Tutorial Analysis with PROOF Arsen Hayrapetyan, Yerevan

in ALICE March 22, 2013 ALICE Offline Tutorial Analysis with PROOF Arsen Hayrapetyan, Yerevan Physics Institute; CERN Arsen. Hayrapetyan@cern. ch 1

Part 2: Practice • Exercise 0: Connecting to CAF, listing anlaysis packages and data

Part 2: Practice • Exercise 0: Connecting to CAF, listing anlaysis packages and data • Exercise 1: ESD analysis on real data on CAF • Exercise 2: ESD analysis on MC data on CAF • Exercise 3: Combining exercises 1 and 2 • Exercise 4: AOD analysis on real data on CAF • Exercise 5: Staging datasets on CAF • Exercise 6: Processing staged datasets on CAF ALICE Offline Tutorial Part 1: Theory • PROOF, AAF, CAF • PROOF Analysis. Merging, submerging • PROOF terminology • The structure of the PROOF analysis task • Analysis data: Trees, Chains, Datasets. • Ali. ROOT usage options March 22, 2013 Outline 2

ALICE Offline Tutorial • Parallel ROOt Facility is an extension of ROOT for interactive

ALICE Offline Tutorial • Parallel ROOt Facility is an extension of ROOT for interactive analysis of large sets of ROOT files in parallel on clusters of computers (analysis facility) or multi-core machines (PROOF Lite). • ALICE Analysis Facilities is a group of analysis facilities dedicated to prompt analysis of relatively small (compared to grid) amount of pp and Pb. Pb data (all AFs) as well as for reconstruction of samples of raw data during data taking (CAF). • CERN Analysis Facility is a PROOF cluster with 464 CPU cores, 1. 4 TB total RAM and 162 TB total disk space. The analysis data is normally staged (copied from grid) to CAF by users or administrator, before analysing it. March 22, 2013 PROOF, AAF, CAF 3

PROOF analysis schema ana. C stdout/result ana. C Remote PROOF Cluster root node 1

PROOF analysis schema ana. C stdout/result ana. C Remote PROOF Cluster root node 1 root Data Result Data node 2 root node 3 Proof master Proof slave root node 4 Result March 22, 2013 root Result ALICE Offline Tutorial Client – Local PC Data Result Data 4

ALICE Offline Tutorial March 22, 2013 Event based parallelism 5

ALICE Offline Tutorial March 22, 2013 Event based parallelism 5

Merging of the results • Option 2: Submerging • Certain nodes are selected to

Merging of the results • Option 2: Submerging • Certain nodes are selected to be submergers, the results produced on Workers are divided between them and merged (producing smaller output), the outputs are sent to the Master to be finally merged • The number of submergers can be chosen automatically (default proof behaviour) or specified by the user • Standard merging implementation for histograms is available • Other classes need to implement Merge(TCollection) • When no merging function is available, individual objects are returned ALICE Offline Tutorial • The results produced on Workers are all sent to the Master and merged there March 22, 2013 • Option 1: Merging on the Master 6

PROOF terminology • PROOF cluster • A set of computers working in coordinate way

PROOF terminology • PROOF cluster • A set of computers working in coordinate way following PROOF protocol • Client • A process created within ROOT session on your machine that is connected to Master • A process on a dedicated node coordinating work between workers • Worker or Slave • A process on a node that processes data and is connected to the Master • Query • A job submitted from the Client to Master. The query consists of a selector and a chain ALICE Offline Tutorial • Single computer in PROOF cluster March 22, 2013 • Node • Selector • A class containing the analysis code • In ALICE we use the Analysis Framework and the Selector is usually derived from Ali. Analysis. Task. SE • Chain • A list of files (trees) to process 7

 • Analysis task is written as a class derived from Ali. Analysis. Task.

• Analysis task is written as a class derived from Ali. Analysis. Task. SE • The data to be analysed is normally staged on AAF (if not you can ask to stage it or do it yourself). The dataset name is then specified in the steering macro • In case you work with PROOF Lite you can put the files containing the data into a chain and specify the TChain object in the steering macro • If you need libraries not contained in Ali. ROOT you should pack them in PAR (PROOF Archive) packages, upload them to AAF and enable them in AAF in the steering macro ALICE Offline Tutorial • In ALICE, we use Analysis Framework to write PROOF-enabled programs March 22, 2013 How does AAF analysis work 8

Constructor User. Create. Output. Objects() Connect. Input. Data() User. Exec() once on Client once

Constructor User. Create. Output. Objects() Connect. Input. Data() User. Exec() once on Client once on each Worker for each tree for each event ALICE Offline Tutorial The (ALICE-specific) analysis task is defined via class derived from Ali. Analisys. Task. SE. March 22, 2013 The structure of the PROOF analysis task Terminate() 9

 • Consists of several branches • Can be stored in one or several

• Consists of several branches • Can be stored in one or several files • Stored contiguously • Can be switched off during data reading (hence speed-up) • Content visualisation with helper functions: Draw(), Scan() • Compressed Branches point x y z File ALICE Offline Tutorial • The tree (object of ROOT class TTree) is a container for data storage March 22, 2013 Trees x x x x x y y y y y z z z z z 10

Chains • The data to be analysed is normally put in a tree or

Chains • The data to be analysed is normally put in a tree or chain for local analysis. For analysis on a PROOF cluster one uses datasets. Chain Tree 1 (File 1) Tree 2 (File 2) ALICE Offline Tutorial • Visualisation functions Draw() and Scan() can be used, as with trees (they will iterate over all elements of the chain) March 22, 2013 • The chain (object of ROOT class TChain) is a collection of files containing trees (TTree objects) Tree 3 (File 3) Tree 4 (File 4) Tree 5 (File 5) 11

 • Staged to AAF by cluster administrators or users. • If staged by

• Staged to AAF by cluster administrators or users. • If staged by administrator, have the names starting with /alice/data or /alice/sim, e. g. : • /alice/data/LHC 10 d_000126285_p 2 • /alice/sim/LHC 10 e 13_118507 • If staged by user, have the names starting with <user_group>/<user_grid_login_name>, e. g. : • /PWG 4/esicking/LHC 10 e 20_130840_AOD 060 • /VZERO/cheynis/run 136104_pass 1 • Users who do not enter in any PWG or detector group, have <user_group>=default, e. g. : ALICE Offline Tutorial • The dataset is a named list of files (containing trees in case of ALICE data) including metadata information about files’ locations. March 22, 2013 Datasets • /default/poghos/LHC 11 a_000146746_pass 3_with_SDD • Can be listed with TProof: : Show. Data. Sets() or via http: //aaf. cern. ch -> Favourite links => AAF datasets 12

Workflow summary March 22, 2013 Input Tree 1 (File 1) proof Tree 2 (File

Workflow summary March 22, 2013 Input Tree 1 (File 1) proof Tree 2 (File 2) Tree 3 (File 3) proof ALICE Offline Tutorial Chain Analysis (Ali. Analysis. Task) Tree 4 (File 3) Tree 5 (File 4) proof 13

Workflow summary proof Output Merged Output ALICE Offline Tutorial March 22, 2013 Analysis (Ali.

Workflow summary proof Output Merged Output ALICE Offline Tutorial March 22, 2013 Analysis (Ali. Analysis. Task) 14

Ali. ROOT usage options • “default” – loads basic analysis libraries (lib. VMC, lib.

Ali. ROOT usage options • “default” – loads basic analysis libraries (lib. VMC, lib. Tree, lib. Physics, lib. Matrix, lib. Minuit, lib. XMLParser, Lib. Gui, lib. STEERBase, lib. ESD, lib. AOD, lib. ANALYSIS, lib. OADB, alib. ANALYSISalice), e. g. : • • • g. Proof->Enable. Package(“VO_ALICE@Ali. Root: : v 5 -03 -21 -AN”, “default”) “ALIROOT” – same as “default”, loads libraries defined in $ALICE_ROOT/macros/loadlibs. C • g. Proof->Enable. Package(“VO_ALICE@Ali. Root: : v 5 -03 -21 -AN”, “ALIROOT”) “REC” – suited for reconstruction, loads libraries defined in $ALICE_ROOT/macros/loadlibsrec. C g. Proof->Enable. Package(“VO_ALICE@Ali. Root: : v 5 -03 -21 -AN”, “REC”) “SIM” – suited for simulation, loads libraries defined in $ALICE_ROOT/macros/loadlibssim. C • g. Proof->Enable. Package(“VO_ALICE@Ali. Root: : v 5 -03 -21 -AN”, “SIM”) • The list of available packages can be displayed with TProof: : Show. Packages(), e. g. : • g. Proof->Show. Packages() ALICE Offline Tutorial • g. Proof->Enable. Package(“VO_ALICE@Ali. Root: : v 5 -03 -21 -AN”) • TProof: : Enable. Package() accepts one of pre-defined string constants as second parameter: March 22, 2013 • As main analysis software package Ali. ROOT should be loaded into memory of the Workers on AAF before processing the data. This is done via method TProof: : Enable. Package(), e. g. : 15

More information • AAF user documentation • PROOF documentation • http: //root. cern. ch/drupal/content/proof

More information • AAF user documentation • PROOF documentation • http: //root. cern. ch/drupal/content/proof ALICE Offline Tutorial March 22, 2013 • http: //aaf. cern. ch/node/89 • Analysis framework documentation • http: //aliweb. cern. ch/Offline/Activities/Analysis. Fram ework/index. html 16

Part 2: Hands-on exercises Set up your credentials • • Put your certificate and

Part 2: Hands-on exercises Set up your credentials • • Put your certificate and private key there Download tutorial files from agenda and unpack them March 22, 2013 $> mkdir. globus • ALICE Offline Tutorial • 17

$> root -l root > g. Env->Set. Value("XSec. GSI. Deleg. Proxy", "2”) root> TProof:

$> root -l root > g. Env->Set. Value("XSec. GSI. Deleg. Proxy", "2”) root> TProof: : Open(ahairape@alice-caf. cern. ch) root> g. Proof->Show. Packages() root> g. Proof->Show. Data. Sets() ALICE Offline Tutorial • • • March 22, 2013 Exercise 0: Connecting to CAF, listing anlaysis packages and data 18

 • root -l ex 1. cxx ALICE Offline Tutorial • $> cd ex

• root -l ex 1. cxx ALICE Offline Tutorial • $> cd ex 1 • Inspect the files (steering macro and analysis task) • Run the analysis March 22, 2013 Exercise 1: ESD analysis on real data on CAF 19

 • root -l ex 2. cxx ALICE Offline Tutorial • $> cd ex

• root -l ex 2. cxx ALICE Offline Tutorial • $> cd ex 2 • Inspect the files (steering macro and analysis task) • Run the analysis March 22, 2013 Exercise 2: ESD analysis on MC data on CAF 20

 • root -l ex 3. cxx ALICE Offline Tutorial • $> cd ex

• root -l ex 3. cxx ALICE Offline Tutorial • $> cd ex 3 • Inspect the files (steering macro and analysis task) • Run the analysis March 22, 2013 Exercise 3: Combining exercises 1 and 2 21

 • root -l ex 4. cxx ALICE Offline Tutorial • $> cd ex

• root -l ex 4. cxx ALICE Offline Tutorial • $> cd ex 4 • Inspect the files (steering macro and analysis task) • Run the analysis March 22, 2013 Exercise 4: AOD analysis on real data on CAF 22

Reference: http: //aaf. cern. ch/node/224 mkdir ex 5 && cd ex 5 • wget

Reference: http: //aaf. cern. ch/node/224 mkdir ex 5 && cd ex 5 • wget http: //afdsmgrd. googlecode. com/svn/tags/v 1. 0. 6/macros/Create. Data. Set. From. Ali. En. C • Task: Edit the file Create. Data. Set. From. Ali. En. C to stage a dataset containing at most 10 files. Use ESD pass 2 data for run #188359 (alien path: /alice/data/2012/LHC 12 g/000188359/ESDs/pass 2) Stage root_archive. zip files, use “Ali. ESDs. root” as anchor Specify “/esd. Tree” for the tree name Name the dataset “test. DS” Test your modifications with “root -l Create. Data. Set. From. Ali. En. C” to make sure the dataset satisfies the conditions above. • Once it is OK, enable actual staging with “commit” option Monitor the staging progress with g. Proof->Show. Data. Sets() • • Solution available on agenda page ALICE Offline Tutorial • • March 22, 2013 Exercise 5: Staging datasets 23

Exercise 6: Analysing staged dataset ALICE Offline Tutorial • Task: Modify the ex 1.

Exercise 6: Analysing staged dataset ALICE Offline Tutorial • Task: Modify the ex 1. cxx file to analyse the dataset you have staged for Exercise 5. March 22, 2013 • $> cd ex 1 24