Roo FitRoo Stats Tutorial CAT Meeting June 2009

  • Slides: 75
Download presentation
Roo. Fit/Roo. Stats Tutorial CAT Meeting, June 2009 Presented by: Max Baak Thanks to:

Roo. Fit/Roo. Stats Tutorial CAT Meeting, June 2009 Presented by: Max Baak Thanks to: Wouter Verkerke, Kyle Cranmer for examples!

Structure of Roo. Fit/Roo. Stats tutorial A tutorial in two sessions. • Part one

Structure of Roo. Fit/Roo. Stats tutorial A tutorial in two sessions. • Part one (Monday, 10 h 30): – Introduction to Roo. Fit – Entry-level exercises – Aimed for beginners • Part two (Friday, 10 h 00): – Introduction to Roo. Stats (statistics extension to Roo. Fit) – (Selection of) Advanced and new features of Roo. Fit – Also useful for experienced users

Roo. Fit: Your toolkit for data modeling What is Roo. Fit? • A powerful

Roo. Fit: Your toolkit for data modeling What is Roo. Fit? • A powerful toolkit for modeling and fitting the expected distribution(s) of events in a physics analysis – Very easy to setup large-scale fit in structured, transparent fashion. • Primarily targeted to high-energy physicists using ROOT – But, even used in financial world. • Originally developed for the Ba. Bar collaboration by Wouter Verkerke and David Kirkby, back in year 2000. – Wouter is main developer • Included with ROOT since v 5. xx – Core code is very mature, stable – Continuous development, addition of more-powerful features. • Standard in CMS!

Documentation Main sources of documentation: • http: //root. cern. ch/drupal/content/users-guide – See for Roo.

Documentation Main sources of documentation: • http: //root. cern. ch/drupal/content/users-guide – See for Roo. Fit documentation (150+ pages) • $ROOTSYS/tutorials/roofit/ – See for example macros • http: //root. cern. ch/root/Reference. html – See for (latest) class descriptions. Roo. Fit classes start with “Roo”. – Roo. Fit code itself is structured and well documented! • http: //root. cern. ch/roottalk/roottalk 09/ – Browse though Root. Talk • Bug Wouter Verkerke directly

Implementation – Add-on package to ROOT Shared library: lib. Roo. Fit. so Data Modeling

Implementation – Add-on package to ROOT Shared library: lib. Roo. Fit. so Data Modeling Toy. MC data Generation Model Visualization Data/Model Fitting MINUIT C++ command line interface & macros Data management & histogramming I/O support Graphics interface

Roo. Fit purpose - Data Modeling for Physics Analysis Distribution of observables x Define

Roo. Fit purpose - Data Modeling for Physics Analysis Distribution of observables x Define data model Probability Density Function F(x; p, q) • Physical parameters of interest p • Other parameters q to describe detector effect (resolution, efficiency, …) • Normalized over allowed range of the observables x w. r. t the parameters p and q Fit model to data Determination of p, q

Data modeling - Desired functionality Building/Adjusting Models Analysis cycle ü Easy to write basic

Data modeling - Desired functionality Building/Adjusting Models Analysis cycle ü Easy to write basic PDFs ( normalization) ü Easy to compose complex models (modular design) ü Reuse of existing functions ü Flexibility – No arbitrary implementation-related restrictions Using Models ü Fitting : Binned/Unbinned (extended) MLL fits, Chi 2 fits ü Toy MC generation: Generate MC datasets from any model ü Visualization: Slice/project model & data in any possible way ü Speed – Should be as fast or faster than hand-coded model

Data modeling – OO representation • Mathematical objects are represented as C++ objects Mathematical

Data modeling – OO representation • Mathematical objects are represented as C++ objects Mathematical concept Roo. Fit class variable Roo. Real. Var function Roo. Abs. Real PDF Roo. Abs. Pdf space point Roo. Arg. Set integral list of space points Roo. Real. Integral Roo. Abs. Data

Model building – (Re)using standard components • Roo. Fit provides a collection of compiled

Model building – (Re)using standard components • Roo. Fit provides a collection of compiled standard PDF classes Physics inspired Roo. BMix. Decay ARGUS, Crystal Ball, Breit-Wigner, Voigtian, B/D-Decay, …. Roo. Polynomial Roo. Hist. Pdf Non-parametric Roo. Argus. BG Histogram, KEYS Roo. Gaussian Basic Gaussian, Exponential, Polynomial, … PDF Normalization • By default Roo. Fit uses numeric integration to achieve normalization • Classes can optionally provide (partial) analytical integrals • Final normalization can be hybrid numeric/analytic form

Model building – (Re)using standard components • Most physics models can be composed from

Model building – (Re)using standard components • Most physics models can be composed from ‘basic’ shapes Roo. BMix. Decay Roo. Polynomial Roo. Hist. Pdf Roo. Argus. BG Roo. Gaussian + Roo. Add. Pdf

Model building – (Re)using standard components • Most physics models can be composed from

Model building – (Re)using standard components • Most physics models can be composed from ‘basic’ shapes Roo. BMix. Decay Roo. Polynomial Roo. Hist. Pdf Roo. Argus. BG Roo. Gaussian * Roo. Prod. Pdf

Model building – (Re)using standard components • Building blocks are flexible – Function variables

Model building – (Re)using standard components • Building blocks are flexible – Function variables can be functions themselves – Just plug in anything you like – Universally supported by core code (PDF classes don’t need to implement special handling) m(y; a 0, a 1) g(x; m, s) Roo. Poly. Var m(“m”, y, Roo. Arg. List(a 0, a 1)) ; Roo. Gaussian g(“g”, ”gauss”, x, m, s) ; g(x, y; a 0, a 1, s)

Model building – Expression based components • Roo. Formula. Var – Interpreted real-valued function

Model building – Expression based components • Roo. Formula. Var – Interpreted real-valued function – Based on ROOT TFormula class – Ideal for modifying parameterization of existing compiled PDFs Roo. BMix. Decay(t, tau, w, …) Roo. Formula. Var w(“w”, ” 1 -2*D”, D) ; • Roo. Generic. Pdf – Interpreted PDF – Based on ROOT TFormula class – User expression doesn’t need to be normalized – Maximum flexibility Roo. Generic. Pdf f("f", "1+sin(0. 5*x)+abs(exp(0. 1*x)*cos(-1*x))", x)

Using models – Fitting options • Fitting interface is flexible and powerful, many options

Using models – Fitting options • Fitting interface is flexible and powerful, many options supported Data type üBinned üUnbinned üWeighted unbinned Goodness-of-fit measure ü-log(Likelihood) üExtended –log(L) üChi 2 üUser Defined Sample interactive MINUIT session Roo. NLLVar nll(“nll”, ”nll”, pdf, data) ; Roo. Minuit m(nll) ; m. hesse() ; x. set. Constant() ; y. set. Val(5) ; m. migrad() ; m. minos() Access any of MINUITs minimization methods Change and fix param. values, using native Roo. Fit interface during fit session Roo. Fit. Result* r = m. save() ; ü(add custom/penalty terms to any of these) Interface üOne-line: Roo. Abs. Pdf: : fit. To(…) üInteractive: Roo. Minuit class Output üModifies parameter objects of PDF üSave snapshot of initial/final parameters, correlation matrix, fit status etc…

Using models – Fitting speed & optimizations • Roo. Fit delivers per-fit tailored optimization

Using models – Fitting speed & optimizations • Roo. Fit delivers per-fit tailored optimization without user overhead! • Benefit of function optimization traditionally a trade-off between – Execution speed (especially in fitting) – Flexibility/maintainability of analysis user code • Optimizations usually hard-code assumptions… • Evaluation of –log(L) in fits lends it well to optimizations – Constant fit parameters often lead to higher-level constant PDF components – PDF normalization integrals have identical value for all data points – Repetitive nature of calculation ideally suited for parallelization. • Roo. Fit automates analysis and implementation of optimization – Modular OO structure of PDF expressions facilitate automated introspection • Find and pre-calculate highest level constant terms in composite PDFs • Apply caching and lazy evaluation for PDF normalization integrals • Optional automatic parallelization of fit on multi-CPU hosts – Optimization concepts are applied consistently and completely to all PDFs – Speedup of factor 3 -10 typical in realistic complex fits

Using models – Plotting • Roo. Plot – View of 1 datasets/PDFs projected on

Using models – Plotting • Roo. Plot – View of 1 datasets/PDFs projected on the same dimension Create the view on mes Roo. Plot* frame = mes. frame() ; Project the data on the mes view data->plot. On(frame) ; Project the PDF on the mes view pdf->plot. On(frame) ; Project the bkg. PDF component pdf->plot. On(frame, Components(“bkg”)) Draw the view on a canvas frame->Draw() ; Axis labels auto-generated

Using models - Overview • All Roo. Fit models provide universal and complete fitting

Using models - Overview • All Roo. Fit models provide universal and complete fitting and Toy Monte Carlo generating functionality – Model complexity only limited by available memory and CPU power • models with >16000 components, >1000 fixed parameters and>80 floating parameters have been used (published physics result) – Very easy to use – Most operations are one-liners Fitting Generating data = gauss. generate(x, 1000) Roo. Abs. Pdf gauss. fit. To(data) Roo. Data. Set Roo. Abs. Data

Advanced features – Task automation • Support for routine task automation, e. g. goodness-of-fit

Advanced features – Task automation • Support for routine task automation, e. g. goodness-of-fit study Input model Generate toy MC Fit model Repeat N times // Instantiate MC study manager Roo. MCStudy mgr(input. Model) ; // Generate and fit 100 samples of 1000 events mgr. generate. And. Fit(100, 1000) ; // Plot distribution of sigma parameter mgr. plot. Param(sigma)->Draw() Accumulate fit statistics Distribution of - parameter values - parameter errors - parameter pulls

Roo. Stats What is Roo. Stats? • Set of statistical tools on top of

Roo. Stats What is Roo. Stats? • Set of statistical tools on top of Roo. Fit (& ROOT). • Joint, open project between LHC experiments and ROOT. • Code is developing quickly. Goals • Enable the combining of results of multiple measurements/experiments, including syst. uncertainties. – Standard in CMS! • Various tools to determine sensitivity and limits. • Techniques ranging from Bayesian to fully Frequentist.

Roo. Stats documentation • http: //twiki. cern. ch/twiki/bin/view/Roo. Stats/ • Mailing list: roostats-development@cern. ch

Roo. Stats documentation • http: //twiki. cern. ch/twiki/bin/view/Roo. Stats/ • Mailing list: roostats-development@cern. ch

Combination of measurements: An Example • Example shows opening (fake) Atlas and CMS measurements,

Combination of measurements: An Example • Example shows opening (fake) Atlas and CMS measurements, and performing a combined fit to a common parameter with a profile likelihood. (thanks to Kyle Cranmer)

Appetizer for first part of tutorial Featuring: • The basic Roo. Fit toolkit •

Appetizer for first part of tutorial Featuring: • The basic Roo. Fit toolkit • Convolutions of functions • Calculate the P-value of your model. • Modelling the top mass spectrum • A combined fit to signal and control samples • Unbinned efficiency curve fit • And much more!

Roo. Fit users tutorial The basics Probability density functions & likelihoods The basics of

Roo. Fit users tutorial The basics Probability density functions & likelihoods The basics of OO data modeling The essential ingredients: PDFs, datasets, functions

Outline of the hands-on part 1. Guide you through the fundamentals of Roo. Fit

Outline of the hands-on part 1. Guide you through the fundamentals of Roo. Fit 2. Look at some sample composite data models 1. Still quite simple, all 1 -dimensional 3. Try to do at least one ‘advanced topic’, preferably more 1. Tutorial 8: Calculating the P-value of your analysis. P-Value = How often does an equivalent data sample with no signal mimic the signal you observe 2. Tutorial 9: Fit to a top mass distribution – Tutorial 10: Simultaneous fit to signal and control samples 4. Copy roofit_tutorial. tar. gz from ~mbaak/public/ 1. Untar roofit_tutorial. tar in your favorite directory on lxplus 2. Contents of the tutorial setup Source this setup script first! This presentation Macros to be used in this tutorial/setup. sh tutorial/docs/roofit_tutorial. ppt tutorial/macros http: //root. cern. ch/root/html/Class. Index. html Open in your favorite browser

Loading Roo. Fit into ROOT • >source setup. sh (in the tutorial/ directory) •

Loading Roo. Fit into ROOT • >source setup. sh (in the tutorial/ directory) • Make sure lib. Roo. Fit. so is in $ROOTSYS/lib • Start ROOT • In the ROOT command line load the Roo. Fit library g. System->Load(“lib. Roo. Fit”) ; – Normally, this happens automatically.

Creating a variable – class Roo. Real. Var • Creating a variable object Roo.

Creating a variable – class Roo. Real. Var • Creating a variable object Roo. Real. Var mass(“mass”, “m(e+e-)”, 0, 1000) ; C++ name Name Title – Every Roo. Fit objects must have a unique name! Allowed range

Creating a probability density function • First create the variables you need Allowed range

Creating a probability density function • First create the variables you need Allowed range Try these commands in an interactive root session. Roo. Real. Var x(“x”, “x observable”, -10, 10) ; Roo. Real. Var mean(“mean”, 0. 0, -10, 10) ; Roo. Real. Var width(“width”, 3. 0, 0. 1, 10. ) ; • Then create a function object Initial value Allowed range Roo. Gaussian gauss(“gauss”, ”Gaussian”, x, mean, width) ; – Give variables as arguments to link variables to a function Continue typing commands till slide 34 …

Making a plot of a function • First create an empty plot Roo. Plot*

Making a plot of a function • First create an empty plot Roo. Plot* frame = x. frame() ; – A frame is a plot associated with a Roo. Fit variable • Draw the empty plot on a ROOT canvas frame->Draw() Plot range taken from limits of x

Making a plot of a function (continued) • Draw the (probability density) function in

Making a plot of a function (continued) • Draw the (probability density) function in the frame gauss. plot. On(frame) ; • Update the frame in the ROOT canvas frame->Draw() Axis label from gauss title Unit normalization

Interacting with objects • Changing and inspecting variables width. get. Val() ; (const Double_t)

Interacting with objects • Changing and inspecting variables width. get. Val() ; (const Double_t) 3. 00 width = 1. 0 ; width. get. Val() ; (const Double_t) 1. 00 • Draw another copy of gauss. plot. On(frame) ; frame->Draw() macro/tut 0. C

Inspecting composite objects • Inspecting the structure of gauss. print. Compact. Tree() ; 0

Inspecting composite objects • Inspecting the structure of gauss. print. Compact. Tree() ; 0 x 10 b 95 fc 0 Roo. Gaussian: : gauss (gauss) [Auto] 0 x 10 b 90 c 78 Roo. Real. Var: : x (x) 0 x 10 b 916 f 8 Roo. Real. Var: : mean (mean) 0 x 10 b 85 f 08 Roo. Real. Var: : width (width) • Inspecting the contents of frame->Print(“v”) Roo. Plot: : frame(10 ba 6830): "A Roo. Plot of "x"" Plotting Roo. Real. Var: : x: "x" Plot contains 2 object(s) (Options="L") Roo. Curve: : curve_gauss. Projected: "Projection of gauss"

Data • Unbinned data is represented by a Roo. Data. Set object • Class

Data • Unbinned data is represented by a Roo. Data. Set object • Class Roo. Data. Set is Roo. Fit interface to ROOT class TTree Roo. Data. Set Roo. Real. Var y Roo. Real. Var x Roo. Data. Set associates a Roo. Real. Var with column of a TTree row x y 1 0. 57 4. 86 2 5. 72 6. 83 3 2. 13 0. 21 4 10. 5 -35. 5 -4. 3 -8. 8 Association by matching TTree Branch name with Roo. Real. Var name

Creating a dataset from a TTree • First open file with TTree macros/tut 1.

Creating a dataset from a TTree • First open file with TTree macros/tut 1. root TFile f(“tut 1. root”) ; f. ls() ; root [1]. ls TFile** tut 1. root TFile* tut 1. root KEY: TTree xtree; 1 xtree->Print() ; • Create Roo. Data. Set from tree Roo. Data. Set data(“data”, ”data”, xtree, x) ; Imported TTree Roo. Fit Variable in dataset

Drawing a dataset on a frame • Create new plot frame, draw Roo. Data.

Drawing a dataset on a frame • Create new plot frame, draw Roo. Data. Set on frame, draw frame Roo. Plot* frame 2 = x. frame() ; data. plot. On(frame 2) ; frame 2 ->Draw() ; Note Poisson Error bars

Overlaying a PDF curve on a dataset • Add PDF curve to frame gauss.

Overlaying a PDF curve on a dataset • Add PDF curve to frame gauss. plot. On(frame 2) ; frame 2 ->Draw() ; Unit normalized PDF automatically scaled to dataset But shape is not right! Lets fit the curve to the data

Fitting a PDF to an unbinned dataset • Fit gauss to data gauss. fit.

Fitting a PDF to an unbinned dataset • Fit gauss to data gauss. fit. To(data) ; • Behind the scenes 1. Roo. Fit constructs the Likelihood from the PDF and the dataset 2. Roo. Fit passes the Likelihood function to MINUIT to minimize 3. Roo. Fit extracts the result from MINUIT and stores in the Roo. Real. Var objects that represent the fit parameters • Draw the result gauss. plot. On(frame 2) ; frame 2 ->Draw() ;

Looking at the fit results • Look again at the PDF variables width. Print()

Looking at the fit results • Look again at the PDF variables width. Print() ; Roo. Real. Var: : sigma: 1. 9376 +/- 0. 043331 (-0. 042646, 0. 044033) L(-10 – 10) mean. Print() ; Roo. Real. Var: : mean: -0. 0843265 +/- 0. 061273 (-0. 061210, 0. 061361) L(-10 - 10) Adjusted value Symmetric error (from HESSE) Asymmetric error (from MINOS, not shown by default) – Results from MINUIT back-propagated to variables

Putting it all together • A self contained example to construct a model, fit

Putting it all together • A self contained example to construct a model, fit it, and plot it on top of the data void fit(TTree* data. Tree) { macro/tut 1. C // Define model Roo. Real. Var x(“x”, ”x”, -10, 10) ; Roo. Real. Var sigma(“sigma”, ”sigma”, 2, 0. 1, 10) ; Roo. Real. Var mean(“mean”, ”mean”, -10, 10) ; Roo. Gaussian gauss(“gauss”, ”gauss”, x, mean, sigma) ; // Import data Roo. Data. Set data(“data”, ”data”, data. Tree, x) ; // Fit data gauss. fit. To(data) ; // Make plot Roo. Plot* frame = x. frame() ; data. plot. On(frame) ; gauss. plot. On(frame) ; frame->Draw() ; } See next slide for instructions

Putting it all together • A self contained example to construct a model, fit

Putting it all together • A self contained example to construct a model, fit it, and plot it on top of the dataset. macro/tut 1. C root [0] TFile f("tut 1. root") root [1]. L tut 1. C root [2] fit(xtree) (From hereon you can modify the macros directly yourself. ) In macro/tut 1. C uncomment two lines below // Make plot and see what happens gauss. fit. To(data, Minos()); gauss. fit. To(data, Hesse()); // default // (See Roo. Minuit. cxx for // all possible fit options) Edit the macro to switch between Hesse and Minos minimization.

Building composite PDFS • Roo. Fit has a collection of many basic PDFs. Roo.

Building composite PDFS • Roo. Fit has a collection of many basic PDFs. Roo. Argus. BG - Argus background shape Roo. Bifur. Gauss - Bifurcated Gaussian Roo. Breit. Wigner - Breit-Wigner shape Roo. CBShape - Crystal Ball function Roo. Chebychev - Chebychev polynomial Roo. Decay - Simple decay function Roo. Exponential - Exponential function Roo. Gaussian - Gaussian function Roo. Keys. Pdf - Non-parametric data description Roo. Polynomial - Generic polynomial PDF Roo. Voigtian - Breit-Wigner (X) Gaussian HTML class documentation in: http: //root. cern. ch/root/html/ROOFIT_RO OFIT_Index. html

Building realistic models • You can combine any number of the preceding PDFs to

Building realistic models • You can combine any number of the preceding PDFs to build more realistic models Roo. Real. Var x(“x”, ”x”, -10, 10) macro/tut 2. C // Construct background model Roo. Real. Var alpha(“alpha”, ”alpha”, -0. 3, -3, 0) ; Roo. Exponential bkg(“bkg”, ”bkg”, x, alpha) ; // Construct signal model Roo. Real. Var mean(“mean”, ”mean”, 3, -10, 10) ; Roo. Real. Var sigma(“sigma”, ”sigma”, 1, 0. 1, 10) ; Roo. Gaussian sig(“sig”, ”sig”, x, mean, sigma) ; // Construct signal+background model Roo. Real. Var sig. Frac(“sig. Frac”, ”signal fraction”, 0. 1, 0, 1) ; Roo. Add. Pdf model(“model”, ”model”, Roo. Arg. List(sig, bkg), sig. Frac) ; // Plot model Roo. Plot* frame = x. frame() ; model. plot. On(frame, Components(bkg), Line. Style(k. Dashed)) ; frame->Draw() ;

Building realistic models

Building realistic models

Sampling ‘toy’ Monte Carlo events from model • Just like you can fit models,

Sampling ‘toy’ Monte Carlo events from model • Just like you can fit models, you can also sample ‘toy’ Monte Carlo events from models Roo. Data. Set* mcdata = model. generate(x, 1000) ; Roo. Plot* frame 2 = x. frame() ; mcdata->plot. On(frame 2) ; model->plot. On(frame 2) ; frame 2 ->Draw() ; Try this yourself. . .

Roo. Add. Pdf can add any number of models Roo. Real. Var x("x", 0,

Roo. Add. Pdf can add any number of models Roo. Real. Var x("x", 0, 10) ; macros/tut 3. C // Construct background model Roo. Real. Var alpha("alpha", -0. 7, -3, 0) ; Roo. Exponential bkg 1("bkg 1", x, alpha) ; // Construct additional background model Roo. Real. Var bkgmean("bkgmean", 7, -10, 10) ; Roo. Real. Var bkgsigma("bkgsigma", 2, 0. 1, 10) ; Roo. Gaussian bkg 2("bkg 2", x, bkgmean, bkgsigma) ; // Construct signal model Roo. Real. Var mean("mean", 3, -10, 10) ; Roo. Real. Var width("width", 0. 5, 0. 1, 10) ; Roo. Breit. Wigner sig("sig", x, mean, width) ; // Construct signal+2 xbackground model Roo. Real. Var bkg 1 Frac("bkg 1 Frac", "signal fraction", 0. 2, 0, 1) ; Roo. Real. Var sig. Frac("sig. Frac", "signal fraction", 0. 5, 0, 1) ; Roo. Add. Pdf model("model", Roo. Arg. List(sig, bkg 1, bkg 2), Roo. Arg. List(sig. Frac, bkg 1 Frac)) ; Roo. Plot* frame = x. frame() ; model. plot. On(frame, Components(Roo. Arg. Set(bkg 1, bkg 2)), Line. Style(k. Dashed)) ; frame->Draw() ;

Roo. Add. Pdf can add any number of models Try adding another signal term

Roo. Add. Pdf can add any number of models Try adding another signal term

Extended Likelihood fits • Regular likelihood fits only fit for shape – Number of

Extended Likelihood fits • Regular likelihood fits only fit for shape – Number of coefficients in Roo. Add. Pdf is always one less than number of components • Can also do extended likelihood fit – Fit for both shape and observed number of events – Accomplished by adding ‘extended likelihood term’ to regular LL • Extended term automatically constructed in Roo. Add. Pdf if given equal number of coefficients & PDFS

Extended Likelihood fits and Roo. Add. Pdf • How to construct an extended PDF

Extended Likelihood fits and Roo. Add. Pdf • How to construct an extended PDF with Roo. Add. Pdf // Construct extended signal+2 xbackground model Roo. Real. Var nbkg 1(“nbkg 1", “number of bkg 1 events", 300, 0, 1000) ; Roo. Real. Var nbkg 2(“nbkg 2", “number of bkg 2 events", 200, 0, 1000) ; Roo. Real. Var nsig( “nsig", “number of signal events", 500, 0, 1000) ; Roo. Add. Pdf emodel(“emodel", Roo. Arg. List(sig, bkg 1, bkg 2), Roo. Arg. List(nsig, nbkg 1, nbkg 2)) ; Previous model sig. Frac bkg 1 Frac Add extended term sig. Frac bkg 1 Frac ntotal • Fitting with extended model emodel. fit. To(data, ”e”) ; Include extended term in fit New representation nsig nbkg 1 nbkg 2 macros/tut 4. C Look at sum, expected errors, and correlations between fitted event numbers

Switching gears • Hands-on exercise so far designed to introduce you to basic model

Switching gears • Hands-on exercise so far designed to introduce you to basic model building syntax • Real power of Roo. Fit is in using those models to explore your analysis in an efficient way • No time in this short session to cover this properly, so next slide just gives you a flavor of what is possible 1. Multidimensional models, selecting by likelihood ratio 2. Demo on ‘task automation’ as mentioned in last slide of introductory slide

Multi-dimensional PDFs • Roo. Fit handles multi-dimensional PDFs as easily as 1 D PDFs

Multi-dimensional PDFs • Roo. Fit handles multi-dimensional PDFs as easily as 1 D PDFs – Just use class Roo. Prod. Pdf to multiply 1 D PDFS • Case example: selecting B+ D 0 K+ – Three discriminating variables: m. ES, Delta. E, m(D 0) Signal Model * * Background Model * * • Look at example model, fit, plots in Run example model, fit, plots in: macros/tut 5. C

Selecting by Likelihood ratio • Plain projection of multi-dimensional PDF and dataset often don’t

Selecting by Likelihood ratio • Plain projection of multi-dimensional PDF and dataset often don’t do justice to analyzing power of PDF – You don’t see selecting power of PDF in dimensions that are projected out Plain projection of m. ES of previous excercise Result from 3 D fit Nsig = 91 ± 10 Close to sqrt(N) – Possible solution: don’t plot all events, but show only events passing cut of signal, bkg likelihood ratios constructed from PDF dimensions that are not shown in the plot macros/tut 6. C

Next topic: How stable is your fit • When looking at low statistics fit,

Next topic: How stable is your fit • When looking at low statistics fit, you’ll want to check explicitly – Is your fit stable and unbiased • Check by running through large set of toy MC samples – Fit each sample, accumulate fit statistics and make pull distribution • Technical procedure – Generate toy Monte Carlo sample with desired number of events – Fit for signal in that sample – Record number of fitted signal events – Repeat steps 1 -3 often – Plot distributions of Nsig, s(Nsig), pull(Nsig) • Roo. Fit can do all this for you with 2 lines of code! – Try out the example in macros/tut 7. C Experiment with lowering number of signal events

How often does background mimic your signal? • Useful quantity in determining importance of

How often does background mimic your signal? • Useful quantity in determining importance of your signal: the P-value – P-Value: How often does a data sample of comparable statistics with no signal mimic the signal yield you observe – Tells you how probable it is that your peak is the result of a statistical fluctuation of the background • Procedure very similar to previous exercise – First generate fake ‘data’, fit data to determine ‘data signal yield’ – Generate toy Monte Carlo sample with 0 signal events – Fit for signal in that sample – Record number of fitted signal events – Repeat steps 1 -3 often – See what fraction of fits result in a signal yield exceeding your ‘observed data yield’ • Try out the example in macros/tut 8. C

Top mass fit • Set up you own top mass fit! • Fit the

Top mass fit • Set up you own top mass fit! • Fit the top quark mass distribution in macros/tut 9. C • For the top signal (around 160 Ge. V/c 2), use a Gaussian. • For the background, try out – Chebychev polynomial (Roo. Chebychev) – Polynomial (Roo. Polynomial) Minumum number of background terms needed? Which background description works better? Why? Look at correlation matrix.

Simultaneous fit to signal and control sample(s) • Often useful to split data sample

Simultaneous fit to signal and control sample(s) • Often useful to split data sample into various categories in a fit – Signal region / control sample(s), number of good jets, b-tag / bveto, fiducial volumes, etc. – Categories may be overlapping • Assigning of categories done using ‘Roo. Category’ objects • Roofit: Easy to make simultaneous fit to various categories – Use full statistical power of entire sample. Correlation of fit parameters automatically propagated! Very powerful technique. • Try out example in macros/tut 10. C – Simultanous fit to signal region and bkg control sample, using a Roo. Category Add a third category & sample that contains a control Gaussian shape with the same width (but different mean) as needed in the signal region. How does the simultaneous fit improve?

Convolution of pdfs • Roo. Fit can do both analytical and numerical convolutions. •

Convolution of pdfs • Roo. Fit can do both analytical and numerical convolutions. • Various analytical convolutions provided. – Eg. Exponential and Gaussian – see class: Roo. Decay • Numerical convolutions done with Fast Fourier transforms – Need the FFTW library. – Often as fast as analytical convolutions! • Try out example: macros/tut 11. C Replace the Landau with a Breit-Wigner function. Add a second, wider exponential. Do the new fit to a toy sample.

Unbinned efficiency curve fit • Statistical error often not properly accounted for when performing

Unbinned efficiency curve fit • Statistical error often not properly accounted for when performing a binned efficiency curve fit. – Binomial errors do not go to zero close when eff=0 or eff=1. • Proper implementation: unbinned efficiency curve fit, possible in Roo. Fit • For an unbinned efficiency fit, see: Use a Roo. MCStudy to proof that the pull distributions of the fit parameters are as expected. (See also tutorial 8. ) macros/tut 12. C

Outline of hands-on part 2 1. A few advanced Roo. Fit examples. 2. Several

Outline of hands-on part 2 1. A few advanced Roo. Fit examples. 2. Several Roo. Stats examples. • Copy roofit_tutorial. tar. gz from ~mbaak/public/ – Untar roofit_tutorial. tar in your favorite directory on lxplus – Contents of the tutorial setup: Source this setup script first! This presentation Macros to be used in second part tutorial/setup. sh tutorial/docs/roofit_tutorial. ppt tutorial/macros 2 of the tutorial http: //root. cern. ch/root/html/Class. Index. html Open in your favorite browser

Root news • Root v 5. 24 will come out next Wednesday. • This

Root news • Root v 5. 24 will come out next Wednesday. • This contains Roo. Fit v 3. 00 • New Roo. Stats functionality & examples. • Example cool, new Roo. Fit functionality: choose between different fit minimizers – Such as: Minuit 2 GSLMulti. Min – pdf->fit. To(data, Minimizer("GSLMulti. Min", "conjugatefr"), . . . ) ;

This Roo. Fit/Roo. Stats tutorial session Featuring: • Making your own pdf • Adaptive

This Roo. Fit/Roo. Stats tutorial session Featuring: • Making your own pdf • Adaptive kernel pdfs • Morphing between datasets • Working with workspaces • Combination of measurements • Profile likelihood scans • Fitting of negative weights • s. Plots • Hypothesis testing

Leftover: Simultaneous fit to several samples • Often useful to split data sample into

Leftover: Simultaneous fit to several samples • Often useful to split data sample into various categories in a fit – Signal region / control sample(s), number of good jets, b-tag / bveto, fiducial volumes, etc. – Categories may be overlapping • Assigning of categories done using ‘Roo. Category’ objects • Roofit: Easy to make simultaneous fit to various categories – Use full statistical power of entire sample. Correlation of fit parameters automatically propagated! Very powerful technique. • Try out example in macros/tut 10. C – Simultanous fit to signal region and bkg control sample, using a Roo. Category Add a third category & sample that contains a control Gaussian shape with the same width (but different mean) as needed in the signal region. How does the simultaneous fit improve?

Making your own PDF/Function • Roo. Fit contains ‘factories’ that make it very easy

Making your own PDF/Function • Roo. Fit contains ‘factories’ that make it very easy for you to create a new pdf or function. • Run the following macro and take a look at the contensts: macros 2/rf 104_classfactory. C • Use the functionality Roo. Class. Factory: : make. Pdf. Instance to make your own Breit-Wigner function. – 1. / ((x-m)*(x-m) + 0. 25*w*w) – The proper normalization is automatically done by Roo. Fit … – Note the produced, corresponding. cxx and. h file! • Use your Breit-Wigner function to generate and fit a Z spectrum. – Mz = 90. 2 Ge. V, Gamma. Z = 2. 5 Ge. V

A Few Cool Examples You Should Really See • Unfortunately we do not have

A Few Cool Examples You Should Really See • Unfortunately we do not have time to go through all features of Roo. Fit … • Next follows a selection of powerful examples. Please go through the macros to see what they do. Ask any related questions you may have.

More Roo. Fit Examples • Taking derivatives and integrals of pdfs/functions. macros 2/rf 111_derivatives.

More Roo. Fit Examples • Taking derivatives and integrals of pdfs/functions. macros 2/rf 111_derivatives. C • Morphing between pdfs – Roo. Linear. Morph macros 2/rf 705_linearmorph. C • Parallel fitting and plotting macros 2/rf 603_multicpu. C – For comparison, do same macro with only 1 cpu-core. • Adaptive kernel estimation. The following pdfs allow you to models any dataset. Just plug your dataset into the pdf. – Roo. Keys. Pdf (1 -dimensional), Roo. NDKeys. Pdf (n-dimensional) – Great for: modeling control samples or difficult correlations! – Great for generating realistic Toy MC samples from data/full-MC! macros 2/rf 707_kernelestimation. C

Morphing with Keys pdfs • The macros 2/morph_keys. C loads two Higgs datasets, one

Morphing with Keys pdfs • The macros 2/morph_keys. C loads two Higgs datasets, one for m(H)=130 Ge. V, and one for m(H) = 170 Ge. V. Using the previous example in rf 705_linearmorph. C, plot the approximated Higgs mass distributions for m(H) = 140, 150, 160 Ge. V.

Conditional pdfs • A conditional pdf describes x, given the observable y. – Pdf

Conditional pdfs • A conditional pdf describes x, given the observable y. – Pdf ( x | y ), eg: a mass resolution function, given the mass error. • For an example conditional pdfs, see: Here the mean of a macros 2/rf 303_conditional. C Gaussian for observable x depends on observable y. • When plotting the distribution of x, one needs to project over the distribution of y. – Note for the plotting: model. plot. On(xframe, Proj. WData()) • Other detailed examples. These show decay distributions with a Gaussian resolution function with per-event fit errors. macros 2/rf 306_condpereventerrors. C macros 2/rf 307_fullpereventerrors. C

Roo. Stats: Workspaces • Roo. Fit allows you to store an entire analysis into

Roo. Stats: Workspaces • Roo. Fit allows you to store an entire analysis into a ‘workspace’ object, that can be stored in a root file. – This includes: pdfs, observables, functions, datasets. • Try out: macros 2/rf 502_wspacewrite. C This stores the file: rf 502_workspace. root • Study the macro how to add an object to a workspace. • You can then read back the workspace in a new session. • Try out: macros 2/rf 502_wspaceread. C . . to read the workspace, and pick up where you left off! Study the macro to see how easy this is done. For the next exercise, rewrite out the workspace, where you change all initial values of the fit parameters, except for the ‘mean’ parameter. Eg sigma, bkgfrac, etc. Reduce the number of signal events.

Roo. Stats: Combination of measurements • Ask your neighbor for the workspace file (‘measurement’)

Roo. Stats: Combination of measurements • Ask your neighbor for the workspace file (‘measurement’) he/she has just created. • Run: macros 2/rf 502_wspacewrite 2. C This creates a second workspace, rf 502_workspace 2. root, which contains a second measurement. • Now pretend these are two Higgs measurements! ; -) • To calculate the average Higgs mass, run the script: macros 2/combination. C (see next slide for result) • Study this script: the combined fit is a full, proper profile likelihood fit! (Both measurements are completely refit!) • What’s the 95% confidence region of ‘mean’? • Rule for combining measurements: parameters with identical names are assumed to be the same parameter. Exercise: Add a third measurement to the combination.

Roo. Stats: Profile likelihood scan “Workspaces are the future of digital publishing. ”

Roo. Stats: Profile likelihood scan “Workspaces are the future of digital publishing. ”

Roo. Stats: Weighted events and samples • Typical use-cases of sample or event-weights: –

Roo. Stats: Weighted events and samples • Typical use-cases of sample or event-weights: – Combination of MC samples with different luminosities – MC@NLO events: positive and negative event weights • When using event weights in unbinned maximum likelihood fit: – Minimum found is correct – Associated errors are incorrect, unless calculated properly • Eg when using negative event weights, statistical error are typically underestimated. • Roo. Fit can do the proper error calculation! • Try: macros 2/topmassfit. C • See next slide …

Roo. Stats: Weighted events and samples • Continue with macro: macros 2/topmassfit. C Turn

Roo. Stats: Weighted events and samples • Continue with macro: macros 2/topmassfit. C Turn off the usage of event weights in the fit and in the plot. (See next slide for instructions. ) How do the statistical errors change? Can you explain the change in behaviour?

Roo. Stats: How to use (event-) weights in Roo. Fit // set the weight

Roo. Stats: How to use (event-) weights in Roo. Fit // set the weight observable dataset->set. Weight. Var(weightvar) ; // default option: errors from original HESSE error matrix // errors are “as expected on data”, but do not reflect correct // MC statistics model. fit. To(*data, Sum. W 2 Error(k. FALSE)) ; // sum-of-weights corrected HESSE error matrix // errors correspond to true MC statistics model. fit. To(*data, Sum. W 2 Error(k. TRUE)) ; // plot weighted events data->plot. On(frame, Data. Error(Roo. Abs. Data: : Sum. W 2)) ;

Roo. Stats: s. Plots • s. Plots is a technique to unfold two distributions,

Roo. Stats: s. Plots • s. Plots is a technique to unfold two distributions, eg. signal and background events, when making a plot. – It’s not a supersymmetric plot ; -) • In this macro, the distribution of interest is the electron isolation, for Z->ee vs QCD. macros 2/rs 301_splot. C • To make s. Plots for the isolation, a ‘control’ discriminator is needed to unfold the signal and bkg distributions. – In this example, provided by a mass fit. • Based on the control variable, an s-eventweight is assigned for each event, which is used to draw the plots. Replace the isolation observable & pdf by antoher observable you are interested in, for example the trigger efficiency category & pdf from tut 12.

Roo. Stats: Profile Likelihood hypothesis test • Profile-likelihood test calculator macros 2/rs 102_hypotestwithshapes. C

Roo. Stats: Profile Likelihood hypothesis test • Profile-likelihood test calculator macros 2/rs 102_hypotestwithshapes. C – Roo. Stats: : Profile. Likelihood. Calculator • The Profile. Likelihood. Calculator makes a profile likelihood scan in the fraction of signal events (‘mu’). – See function: Do. Hypothesis. Test() • Using a Gaussian interpretation (Wilk’s Theorem), the LL-ratio at zero signal gets converted into a P-value (=significance) Try to make a Profile likelihood scan of ‘mu’ to test the Gaussian interpretation (see also: macros 2/combination. C), and calculate the significance yourself. Do this in the function: Make. Plots()

Roo. Stats: Hybrid. Calculator • Hybrid. Calculator – Roo. Stats: : Hypo. Test. Calculator

Roo. Stats: Hybrid. Calculator • Hybrid. Calculator – Roo. Stats: : Hypo. Test. Calculator macros 2/rs 201_hybridcalculator. C – A hybrid Frequentist and Bayesian tool. The tool integrate over nuisance (bkg) parameters using a Freq. technique. • The macro has a (Gaussian) Bayesian prior for the number of bkg events, but is Frequentist (ie. toy MC) to get -2 ln. Q distributions from S&B and B-only samples. macros 2/rf 604_constraints. C • See: to add a Gaussian bkg constraint directly to the likelihood sum. Apply the Profile. Likelihood. Calculator to compare with the Hybrid. Calculator signal significance

Further reading • There are more (advanced) Roo. Fit features and examples worth demonstrating

Further reading • There are more (advanced) Roo. Fit features and examples worth demonstrating than one can fit in two brief tutorial sessions. • I have tried to show a (popular) snapshot of all possibilities. You are encouraged to take a look at: – The Roo. Fit documentation (docs/Roo. Fit_Users_Manual_2. 91 -33. pdf) – The examples in the directory: examples/roofit/ • … to experience the full power of Roo. Fit and Roo. Stats ! I hope you’ve enjoyed the tutorials and will continue to keep on using Roo. Fit and Roo. Stats in the future!