TMVA Toolkit for Multivariate Data Analysis with ROOT

  • Slides: 33
Download presentation
TMVA Toolkit for Multivariate Data Analysis with ROOT Helge Voss, MPI-K Heidelberg on behalf

TMVA Toolkit for Multivariate Data Analysis with ROOT Helge Voss, MPI-K Heidelberg on behalf of: Andreas Höcker, Fredrik Tegenfeld, Joerg Stelzer* Supply an environment to easily: apply different sophisticated data selection algorithms have them all trained, tested and evaluated find the best one for your selection problem and contributors: A. Christov, S. Henrot-Versillé, M. Jachowski, A. Krasznahorkay Jr. , Y. Mahalalel, X. Prudent, P. Speckmayer, M. Wolter, A. Zemla Helge Voss Nikhef 23 rd - 27 th April 2007 http: //tmva. sourceforge. net/ ar. Xiv: physics/0703039 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 1

Motivation/Outline ROOT: is the analysis framework used by most (HEP)-physicists Idea: rather than just

Motivation/Outline ROOT: is the analysis framework used by most (HEP)-physicists Idea: rather than just implementing new MVA techniques and making them somehow available in ROOT (i. e. like TMulit. Layer. Percetron does): have one common platform/interface for all MVA classifiers easy to use and compare different MVA classifiers train/test on same data sample and evaluate consistently Outline: introduction the MVA classifiers available in TMVA demonstration with toy examples summary Helge Voss Nikhef 23 rd - 27 th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 2

Multivariate Event Classification All multivariate classifiers condense (correlated) multi-variable input information into a single

Multivariate Event Classification All multivariate classifiers condense (correlated) multi-variable input information into a single scalar output variable: Rn R y(Bkg) 0 y(Signal) 1 … Helge Voss Nikhef 23 rd - 27 th April 2007 One variable to base your decision on TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 3

What is in TMVA currently includes: Rectangular cut optimisation Projective and Multi-dimensional likelihood estimator

What is in TMVA currently includes: Rectangular cut optimisation Projective and Multi-dimensional likelihood estimator Fisher discriminant and H-Matrix ( 2 estimator) Artificial Neural Network (3 different implementations) Boosted/bagged Decision Trees Rule Fitting Support Vector Machines all classifiers are highly customizable common pre-processing of input: de-correlation, principal component analysis support of arbitrary pre-selections and individual event weights TMVA package provides training, testing and evaluation of the classifiers each classifier provides a ranking of the input variables classifiers produce weight files that are read by reader class for MVA application integrated in ROOT(since release 5. 11/03) and very easy to use! Helge Voss Nikhef 23 rd - 27 th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 4

Preprocessing the Input Variables: Decorrelation Commonly realised for all methods in TMVA (centrally in

Preprocessing the Input Variables: Decorrelation Commonly realised for all methods in TMVA (centrally in Data. Set class): Removal of linear correlations by rotating variables using the square-root of the correlation matrix using the Principal Component Analysis original SQRT derorr. PCA derorr. Note that this “de-correlation” is only complete, if: input variables are Gaussians correlations linear only in practise: gain form de-correlation often rather modest – or even harmful Helge Voss Nikhef 23 rd - 27 th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 5

Cut Optimisation Simplest method: cut in rectangular volume using scan in signal efficiency [0

Cut Optimisation Simplest method: cut in rectangular volume using scan in signal efficiency [0 1] and maximise background rejection from this scan, the optimal working point in terms if S, B numbers can be derived Technical problem: how to perform optimisation TMVA uses: random sampling, Simulated Annealing or Genetics Algorithm speed improvement in volume search: training events are sorted in Binary Seach Trees do this in normal variable space or de-correlated variable space Helge Voss Nikhef 23 rd - 27 th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 6

Projected Likelihood Estimator (PDE Appr. ) Combine probability from different variables for an event

Projected Likelihood Estimator (PDE Appr. ) Combine probability from different variables for an event to be signal or background like Optimal if no correlations and PDF’s are correct (known) usually it is not true development of different methods PDFs discriminating variables Likelihood ratio for event ievent Species: signal, background types Technical problem: how to implement reference PDFs 3 ways: counting, function fitting , parametric fitting (splines, kernel estimators. ) automatic, unbiased, difficult to automate but suboptimal Helge Voss Nikhef 23 rd - 27 th April 2007 easy to automate, can create artefacts TMVA uses: Splines 0 -5, Kernel estimators TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 7

Multidimensional Likelihood Estimator Generalisation of 1 D PDE approach to Nvar dimensions Optimal method

Multidimensional Likelihood Estimator Generalisation of 1 D PDE approach to Nvar dimensions Optimal method – in theory – if “true N-dim PDF” were known Practical challenges: derive N-dim PDF from training sample x 2 S TMVA implementation: Range search PDERS count number of signal and background events in “vicinity” of a data event fixed size or adaptive (latter one = k. NN-type classifiers) test event B x 1 volumes can be rectangular or spherical use multi-D kernels (Gaussian, triangular, …) to weight events within a volume speed up range search by sorting training events in Binary Trees Carli-Koblitz, NIM A 501, 576 (2003) Helge Voss Nikhef 23 rd - 27 th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 8

Fisher Discriminant (and H-Matrix) Well-known, simple and elegant classifier: determine linear variable transformation where:

Fisher Discriminant (and H-Matrix) Well-known, simple and elegant classifier: determine linear variable transformation where: linear correlations are removed mean values of signal and background are “pushed” as far apart as possible the computation of Fisher response is very simple: linear combination of the event variables * Fisher coefficients “Fisher coefficients” Helge Voss Nikhef 23 rd - 27 th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 9

Artificial Neural Network (ANN) Get a non-linear classifier response by giving linear combination of

Artificial Neural Network (ANN) Get a non-linear classifier response by giving linear combination of input variables to nodes with non-linear activation Nodes (or neurons) and arranged in series Feed-forward Multilayer Perceptron Feed-Forward Multilayer Perceptrons (3 different implementations in TMVA) 1 input layer 1 . . . Nvar discriminating input variables k hidden layers 1 . . . i j N M 1 . . 1 ouput layer 1 . . . 2 output classes (signal and background) Mk (“Activation” function) with: Training: adjust weights using known event such that signal/background are best separated Helge Voss Nikhef 23 rd - 27 th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 10

Decision Trees sequential application of “cuts” which splits the data into nodes, and the

Decision Trees sequential application of “cuts” which splits the data into nodes, and the final nodes (leaf) classifies an event as signal or background Training: growing a decision tree Start with Root node Split training sample according to cut on best variable at this node Splitting criterion: e. g. , maximum “Gini-index”: purity (1– purity) Continue splitting until min. number of events or max. purity reached Classify leaf node according to majority of events, or give Decision tree before pruning weight; unknown test events are classified accordingly Decision tree after pruning Bottom up Pruning: remove statistically insignificant nodes avoid overtraining Helge Voss Nikhef 23 rd - 27 th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 11

Boosted Decision Trees: well know since a long time but hardly used in HEP

Boosted Decision Trees: well know since a long time but hardly used in HEP (although very similar to “simple Cuts”) Disatvantage: instability: small changes in training sample can give large changes in tree structure Boosted Decision Trees (1996): combine several decision trees: forest classifier output is the (weighted) majority vote of individual trees derived from same training sample with different event weights e. g. Ada. Boost: wrong classified training events are given a larger weight bagging (re-sampling with replacement) random weights Remark: bagging/boosting create a basis of classifiers final classifier is a linear combination of base classifiers Helge Voss Nikhef 23 rd - 27 th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 12

Rule Fitting (Predictive Learning via Rule Ensembles) Following Rule. Fit from Friedman-Popescu: Friedman-Popescu, Tech

Rule Fitting (Predictive Learning via Rule Ensembles) Following Rule. Fit from Friedman-Popescu: Friedman-Popescu, Tech Rep, Stat. Dpt, Stanford U. , 2003 Classifier is a linear combination of simple base classifiers that are called rules and are here: sequences of cuts: Rule. Fit classifier rules (cut sequence rm=1 if all cuts satisfied, =0 otherwise) Sum of rules normalised discriminating event variables Linear Fisher term The procedure is: 1. 2. create the rule ensemble created from a set of decision trees fit the coefficients “Gradient directed regularization” (Friedman et al) Helge Voss Nikhef 23 rd - 27 th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 13

Support Vector Machines Find hyperplane that best separates signal from background x 2 best

Support Vector Machines Find hyperplane that best separates signal from background x 2 best separation: maximum distance between closest events (support) to hyperplane linear decision boundary Non linear cases: transform the variables in higher dimensional feature space where linear boundary (hyperplanes) can separate the data transformation is done implicitly using Kernel Functions that effectively introduces a metric for the distance measures that “mimics” the transformation x 1 x 3 xx 2 2 Choose Kernel and fit the hyperplane Available Kernels: Gaussian, Polynomial, x 1 Sigmoid Helge Voss Nikhef 23 rd - 27 th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 x 1 14

A Complete Example Analysis void TMVAnalysis( ) { TFile* output. File = TFile: :

A Complete Example Analysis void TMVAnalysis( ) { TFile* output. File = TFile: : Open( "TMVA. root", "RECREATE" ); create Factory TMVA: : Factory *factory = new TMVA: : Factory( "MVAnalysis", output. File, "!V"); TFile *input = TFile: : Open("tmva_example. root"); TTree *signal = (TTree*)input->Get("Tree. S"); TTree *background = (TTree*)input->Get("Tree. B"); factory->Add. Signal. Tree ( signal, 1. ); factory->Add. Background. Tree( background, 1. ); factory->Add. Variable("var 1+var 2", 'F'); factory->Add. Variable("var 1 -var 2", 'F'); factory->Add. Variable("var 3", 'F'); factory->Add. Variable("var 4", 'F'); give training/test trees tell which variables (example uses variables not directly avaiable in the tree: i. e. ” var 1+var 2”) factory->Prepare. Training. And. Test. Tree("", "NSig. Train=3000: NBkg. Train=3000: Split. Mode=Random: !V" ); factory->Book. Method( TMVA: : Types: : k. Likelihood, "Likelihood", select "! V: !Transform. Output: Spline=2: NSmooth=5: NAv. Evt. Per. Bin=50" ); the MVA methods factory->Book. Method( TMVA: : Types: : k. MLP, "MLP", "!V: NCycles=200: Hidden. Layers=N+1, N: Test. Rate=5" ); factory->Train. All. Methods(); factory->Test. All. Methods(); factory->Evaluate. All. Methods(); train, test and evaluate output. File->Close(); delete factory; } Helge Voss Nikhef 23 rd - 27 th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 15

Example Application void TMVApplication( ) { TMVA: : Reader *reader = new TMVA: :

Example Application void TMVApplication( ) { TMVA: : Reader *reader = new TMVA: : Reader("!Color"); create Reader Float_t var 1, var 2, var 3, var 4; reader->Add. Variable( "var 1+var 2", &var 1 ); reader->Add. Variable( "var 1 -var 2", &var 2 ); reader->Add. Variable( "var 3", &var 3 ); reader->Add. Variable( "var 4", &var 4 ); tell it about the variables reader->Book. MVA( "MLP method", "weights/MVAnalysis_MLP. weights. txt" ); selected MVA method TFile *input = TFile: : Open("tmva_example. root"); TTree* the. Tree = (TTree*)input->Get("Tree. S"); Float_t user. Var 1, user. Var 2; the. Tree->Set. Branch. Address( "var 1", &user. Var 1 ); the. Tree->Set. Branch. Address( "var 2", &user. Var 2 ); the. Tree->Set. Branch. Address( "var 3", &var 3 ); the. Tree->Set. Branch. Address( "var 4", &var 4 ); set tree variables (example uses variables not directly avaiable in the tree) event loop for (Long 64_t ievt=3000; ievt<the. Tree->Get. Entries(); ievt++) { the. Tree->Get. Entry(ievt); var 1 = user. Var 1 + user. Var 2; var 2 = user. Var 1 - user. Var 2; cout << reader->Evaluate. MVA( "MLP method" } ) <<endl; calculate the MVA response delete reader; } Helge Voss Nikhef 23 rd - 27 th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 16

A purely academic Toy example Use data set with 4 linearly correlated Gaussian distributed

A purely academic Toy example Use data set with 4 linearly correlated Gaussian distributed variables: -------------------- Rank : Variable : Separation -------------------- 1 : var 3 : 3. 834 e+02 2 : var 2 : 3. 062 e+02 3 : var 1 : 1. 097 e+02 4 : var 0 : 5. 818 e+01 ------------------- Helge Voss Nikhef 23 rd - 27 th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 17

Validating the Classifier Training Validating the classifiers Projective likelihood PDFs, MLP training, BDTs, .

Validating the Classifier Training Validating the classifiers Projective likelihood PDFs, MLP training, BDTs, . . TMVA GUI average no. of nodes before/after pruning: 4193 / 968 Helge Voss Nikhef 23 rd - 27 th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 18

The Output Classifier Output TMVA output distributions: Likelihood PDERS Fisher correlations removed due to

The Output Classifier Output TMVA output distributions: Likelihood PDERS Fisher correlations removed due to correlations Neural Network Boosted Decision Trees Helge Voss Nikhef 23 rd - 27 th April 2007 Rule Fitting TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 19

The Output Evaluation Output TMVA output distributions for Fisher, Likelihood, BDT and MLP… e

The Output Evaluation Output TMVA output distributions for Fisher, Likelihood, BDT and MLP… e n For this case: Fisher O is h discriminant provides T n theoretically ‘best’ ha T lt possible method u ore M h c u M e : te o N ou b A ll A t Us c ti s i al e R e Helge Voss Nikhef 23 rd - 27 th April 2007 s Ca e r sa fic f i D Same as decorrelated Likelihood Cuts and Likelihood w/o de-correlation are inferior TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 20

Evaluation Output (taken from TMVA printout) Better classifier Evaluation results ranked by best signal

Evaluation Output (taken from TMVA printout) Better classifier Evaluation results ranked by best signal efficiency and purity (area) ---------------------------------------MVA Signal efficiency at bkg eff. (error): | Sepa. Signifi. Methods: @B=0. 01 @B=0. 10 @B=0. 30 Area | ration: cance: ---------------------------------------Fisher : 0. 268(03) 0. 653(03) 0. 873(02) 0. 882 | 0. 444 1. 189 MLP : 0. 266(03) 0. 656(03) 0. 873(02) 0. 882 | 0. 444 1. 260 Likelihood. D : 0. 259(03) 0. 649(03) 0. 871(02) 0. 880 | 0. 441 1. 251 PDERS : 0. 223(03) 0. 628(03) 0. 861(02) 0. 870 | 0. 417 1. 192 Rule. Fit : 0. 196(03) 0. 607(03) 0. 845(02) 0. 859 | 0. 390 1. 092 HMatrix : 0. 058(01) 0. 622(03) 0. 868(02) 0. 855 | 0. 410 1. 093 BDT : 0. 154(02) 0. 594(04) 0. 838(03) 0. 852 | 0. 380 1. 099 Cuts. GA : 0. 109(02) 1. 000(00) 0. 717(03) 0. 784 | 0. 000 Likelihood : 0. 086(02) 0. 387(03) 0. 677(03) 0. 757 | 0. 199 0. 682 ---------------------------------------Testing efficiency compared to training efficiency (overtraining check) ---------------------------------------MVA Signal efficiency: from test sample (from traing sample) Methods: @B=0. 01 @B=0. 10 @B=0. 30 ---------------------------------------Fisher : 0. 268 (0. 275) 0. 653 (0. 658) 0. 873 (0. 873) MLP : 0. 266 (0. 278) 0. 656 (0. 658) 0. 873 (0. 873) Likelihood. D : 0. 259 (0. 273) 0. 649 (0. 657) 0. 871 (0. 872) Check PDERS : 0. 223 (0. 389) 0. 628 (0. 691) 0. 861 (0. 881) Rule. Fit : 0. 196 (0. 198) 0. 607 (0. 616) 0. 845 (0. 848) for over HMatrix : 0. 058 (0. 060) 0. 622 (0. 623) 0. 868 (0. 868) BDT : 0. 154 (0. 268) 0. 594 (0. 736) 0. 838 (0. 911) training Cuts. GA : 0. 109 (0. 123) 1. 000 (0. 424) 0. 717 (0. 715) Likelihood : 0. 086 (0. 092) 0. 387 (0. 379) 0. 677 (0. 677) --------------------------------------Helge Voss Nikhef 23 rd - 27 th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 21

More Toys: Linear-, Cross-, Circular Correlations More Toys: Circular correlations Illustrate the behaviour of

More Toys: Linear-, Cross-, Circular Correlations More Toys: Circular correlations Illustrate the behaviour of linear and nonlinear classifiers Circular correlations (same for signal and background) Helge Voss Nikhef 23 rd - 27 th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 22

Illustustration: Events weighted by MVA-response: Weight Variables by Classifier Performance Example: How do classifiers

Illustustration: Events weighted by MVA-response: Weight Variables by Classifier Performance Example: How do classifiers deal with the correlation patterns ? Linear Classifiers: Fisher Likelihood decorrelated Likelihood Non Linear Classifiers: Decision Trees Helge Voss Nikhef 23 rd - 27 th April 2007 PDERS TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 23

Final Classifier Performance Background rejection versus signal efficiency curve: Circular Example Helge Voss Nikhef

Final Classifier Performance Background rejection versus signal efficiency curve: Circular Example Helge Voss Nikhef 23 rd - 27 th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 24

More Toys: “Schachbrett” (chess board) Event Distribution Performance achieved without parameter adjustments: PDERS and

More Toys: “Schachbrett” (chess board) Event Distribution Performance achieved without parameter adjustments: PDERS and BDT are best “out of the box” After some parameter tuning, also SVM und ANN(MLP) perform Theoretical maximum Events weighted by SVM response Helge Voss Nikhef 23 rd - 27 th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 25

TMVA-Users Guide We (finally) have a Users Guide ! Available from tmva. sf. net

TMVA-Users Guide We (finally) have a Users Guide ! Available from tmva. sf. net TMVA Users Guide 78 pp, incl. code examples ar. Xiv: physics/0703039 Helge Voss Nikhef 23 rd - 27 th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 26

Summary TMVA unifies highly customizable and performing multivariate classification algorithms in a single user-friendly

Summary TMVA unifies highly customizable and performing multivariate classification algorithms in a single user-friendly framework This ensures most objective classifier comparisons and simplifies their use TMVA is available from tmva. sf. net and in ROOT (>5. 11/03) A typical TMVA analysis requires user interaction with a Factory (for classifier training) and a Reader (for classifier application) a set of ROOT macros displays the evaluation results We will continue to improve flexibility and add new classifiers Bayesian Classifiers “Committee Method” combination of different MVA techniques C-code output for trained classifiers (for selected methods…) Helge Voss Nikhef 23 rd - 27 th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 27

More Toys: Linear, Cross, Circular correlations More Toys: Linear-, Cross-, Circular Correlations Illustrate the

More Toys: Linear, Cross, Circular correlations More Toys: Linear-, Cross-, Circular Correlations Illustrate the behaviour of linear and nonlinear classifiers Linear correlations Circular correlations (same for signal and background) (opposite for signal and background) (same for signal and background) Helge Voss Nikhef 23 rd - 27 th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 28

Illustustration: Events weighted by MVA-response: Weight Variables by Classifier Performance How well do the

Illustustration: Events weighted by MVA-response: Weight Variables by Classifier Performance How well do the classifier resolve the various correlation patterns ? Linear correlations Circular correlations (same for signal and background) (opposite for signal and background) (same for signal and background) Helge Voss Nikhef 23 rd - 27 th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 29

Final Classifier Performance Background rejection versus signal efficiency curve: Linear Circular Cross Example Helge

Final Classifier Performance Background rejection versus signal efficiency curve: Linear Circular Cross Example Helge Voss Nikhef 23 rd - 27 th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 30

Stability with Respect to Irrelevant Variables Stability with respect to irrelevant variables Toy example

Stability with Respect to Irrelevant Variables Stability with respect to irrelevant variables Toy example with 2 discriminating and 4 non-discriminating variables ? use all only discriminant two discriminant variables in classifiers Helge Voss Nikhef 23 rd - 27 th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 31

Using TMVA in Training and Application Can be ROOT scripts, C++ executables or python

Using TMVA in Training and Application Can be ROOT scripts, C++ executables or python scripts (via Py. ROOT), or any other high-level language that interfaces with ROOT Helge Voss Nikhef 23 rd - 27 th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 32

Introduction: Event Classification Different techniques use different ways trying to exploit (all) features compare

Introduction: Event Classification Different techniques use different ways trying to exploit (all) features compare and choose Rectangular cuts? x 2 A linear boundary? x 2 S B A nonlinear one? x 2 S B x 1 x 1 How to place the decision boundary? Let the machine learn it from training events Helge Voss Nikhef 23 rd - 27 th April 2007 TMVA Toolkit for Multivariate Data Analysis: ACAT 2007 33