Roo Fit Roo Stats tools for data modeling

  • Slides: 22
Download presentation
Roo. Fit & Roo. Stats tools for data modeling and statistical analysis in ROOT

Roo. Fit & Roo. Stats tools for data modeling and statistical analysis in ROOT Wouter Verkerke (NIKHEF) Wouter Verkerke, NIKHEF

Overview of this talk • Talk overview – Recently added Roo. Fit features –

Overview of this talk • Talk overview – Recently added Roo. Fit features – The Roo. Stats project • Current release cycle – Have started major new Roo. Fit development cycle in ROOT development release 5. 17, Roo. Fit 2. 23 – Stable version to be delivered in ROOT production 5. 18/00. Release date Dec 12, Roo. Fit v 2. 30. Deadline for last code tomorrow. Wouter Verkerke, NIKHEF

New features – Core engineering • Core engineering – Complete rewrite of optimization algorithms

New features – Core engineering • Core engineering – Complete rewrite of optimization algorithms for optimization of likelihood calculations – Recent versions of classes like Roo. Add. Pdf and Roo. Prod. Pdf extensive use caching of composite function objects that represent partial results for given integration/normalization configurations. Cache objects created is usually deferred till first use and multiple configurations are handled simultaneously – Old optimization code not equipped to handle optimization and client/server link reconnection of cached objects well – New support class Roo. Obj. Cache. Manager takes transparently care of all caching and optimization logic for cached function objects – Many specialized hooks and support functions to work around limitations of old code have now disappeared Code is much cleaner and more maintainable for future – Can in principle do more optimizations than before but improved robustness in handling certain conditions adds some overhead. Speed is expected to be within ~5% of original Roo. Fit with fluctuations depending on application – Next version of Roo. Fit will have significant speedups of (complex) plot projections as new optimization engine can also be applied to plot projections (works in principle, but not enabled yet) Wouter Verkerke, NIKHEF

New features – Roo. Msg. Service • All Roo. Fit messaging now routed through

New features – Roo. Msg. Service • All Roo. Fit messaging now routed through new Roo. Msg. Service interface • New service has interface that allows detailed control over what messages are printed. Can filter on – Message severity (DEBUG, INFO, WARNING, ERROR, FATAL) – Message topic (Plotting, Integration, Generation, …) – Originating object class (Roo. Gaussian etc…) – Originating object name (“My. Signal. Pdf” etc…) – Tags applied to object (arg->set. Label(“Debug. Me. Label”)) • Control through Roo. Msg. Service: : instance() – Default configuration root [0] Roo. Msg. Service: : instance(). Print("v") All Message streams [0] Min. Level = WARNING Topic = Any [1] Min. Level = INFO Topic = Generation Minization Plotting Fitting \ Caching Optimization Wouter Verkerke, NIKHEF

New features – Roo. Msg. Service • Add new streams as you like, i.

New features – Roo. Msg. Service • Add new streams as you like, i. e. – Roo. Msg. Service: : instance(). add. Stream(k. INFO, Topic(k. Integration), Object. Name("My. Pdf")) • A lot of new INFO level messages have been added the topics of Integration, Generation – Explain how Roo. Fit arrives at its decision to perform integration, generation etc… • Note – Adding streams with object-specific message may affect performance. Mostly intended for your debugging convenience – Adding any stream with DEBUG level messages, even not object specific affect performance significantly. Again, these exist for your debugging convenience. • Disabling all message streams will make Roo. Fit completely silent (in case you care…) Wouter Verkerke, NIKHEF

New features – Graph. Viz support • You can draw graphs of Roo. Fit

New features – Graph. Viz support • You can draw graphs of Roo. Fit object trees of arbitrary complexity using the Open. Souce Graph. Viz tools for graph visualization – ROOT> pdf->graph. Viz. Tree(“pdf. dot”) – UNIX> dot –Tps –o pdf. ps pdf. dot (directed graph algorithm) UNIX> fdp –Tps –o pdf. ps pdf. dot (spring model algorithm) ‘dot’ ‘fdp’ Wouter Verkerke, NIKHEF

New features – Roo. Class. Factory • Code factory for Roo. Fit classes, writes

New features – Roo. Class. Factory • Code factory for Roo. Fit classes, writes skeleton class for Roo. Abs. Pdf, Roo. Abs. Real – Example that writes function ready to be compiled Roo. Class. Factory: : make. Function("Roo. Dilution", "w, w_p 0, w_p 1", "1 -2*(w_p 0+(1 -w_p 1)*w)") ; . L Roo. Dilution. cxx+ // // class name of variables function expression load class • Can also immediately instantiate code Roo. Abs. Real* f = Roo. Class. Factory: : define. Function("f", "D(1 -2 w)", Roo. Arg. Set(D, w)) – Returns function to dedicated compiled function object – Fast replacement of Roo. Formula. Var • Many more options – Can also specify optional analytical integrals in extra argument – Can also create functions with Roo. Category arguments Wouter Verkerke, NIKHEF

New features – Modular extension of Roo. MCStudy • New version of Roo. MCStudy

New features – Modular extension of Roo. MCStudy • New version of Roo. MCStudy has hooks to insert chain of modules in study that allow to intervene before and after each generation and fit step to customize behavior • Two standard modules provided: – Roo. DLLSignificance. MCSModule calculates significance with delta (-log(L)) method in given parameter. Result is added to Roo. Data. Set with output – Roo. Randomize. Params. MCSModule randomize generation value of given parameter before each generation (uniform or Gaussian) – Abstract base class for modules allows to write your own • Example use Roo. DLLSignificance. MCSModule sig. Module(*nsig, 0) ; Roo. Randomize. Param. MCSModule rand. Module ; rand. Module. sample. Sum. Uniform(param, lo. Val, High. Val) ; Roo. MCStudy mcs(*model, *mjjj) ; mcs. add. Module(sig. Module) ; mcs. add. Module(rand. Module) ; Wouter Verkerke, NIKHEF

New operator PDFs – Numeric convolution through FFT • • New generic convolutions operator

New operator PDFs – Numeric convolution through FFT • • New generic convolutions operator PDF Roo. FTTConv. Pdf that can numerically convolve any two p. d. f. s using FFT techniques – Use (free) FFTW 3 fourier transform engine (www. fftw. org) – Must build ROOT with –enable-fftw Example code Roo. Real. Var x("x", -10, 20) ; x. set. Bins(1000) ; // Binning controls FFT sampling density. Use at least 1000 for good precision Roo. Gaussian gx("gx", x, mx, sx) ; Roo. Landau lx("lx", x, ml, sl) ; Roo. FFTConv. Pdf gxlx("gxbx", "gx (X) bx", x, gx, lx) ; • • Amazing speed and precision, ~100 x faster than Roo. Num. Conv. Pdf, few num. stability issues – Unbinned ML fit of Bmix (x) Gauss to 20000 events with dm, tau, D floating = 30 seconds (=about same as analytical calculation) – Performance will drop if per-event errors are used as FFT calculate precalculates p. d. f in one operation for all observable values. Efficient when p. d. f is evaluated at many points for one set of parameters, not efficient when p. d. f is only evaluated once. Future versions will support >1 convolution as well Wouter Verkerke, NIKHEF

New pdfs – Generic n-Dim KEYS p. d. f • Designed as replacement of

New pdfs – Generic n-Dim KEYS p. d. f • Designed as replacement of Roo 2 DKeys. Pdf – Written by Max Baak for ATLAS higgs analysis • NB Several bugs were discovered in Roo 2 DKeys. Pdf – Works in any number of dimensions. – Takes correlations of input data into account in shape of kernel – Implementation has optimizations for speed (work best at higher dimensions) – Analytical integration and analytical partial integrals Projection with partial analytical integral Wouter Verkerke, NIKHEF

Other miscellaneous new features • New version of class Roo. Product (product of any

Other miscellaneous new features • New version of class Roo. Product (product of any number of Roo. Abs. Real objects) – Support for factorizing (analytical) integration of product expression analoguous to Roo. Prod. Pdf – Provided by Gerhard Raven • New class Roo. Profile. LL that represents the profile likelihood for a given likelihood – Example given a p. d. f F with parameters p 1, p 2, p 3. Construction of likelihood (= function of p 1, p 2, p 3) Roo. NLLVar nll("nll", px, *d) ; – Construction of profile likelihood in p 1 (=likelihood minimized w. r. t all parameters except p 1) Roo. Profile. LL pnll 1("pnll", "profile ll", nll, p 1) ; – Expensive function (MINUIT is called for every evaluation) – Plotting / scanning of profile likelihood will give correct error estimate on p 1 Wouter Verkerke, NIKHEF

New concept – Roo. Workspace • One of the main missing features in Roo.

New concept – Roo. Workspace • One of the main missing features in Roo. Fit is a tool to organize complex projects – A container for composite p. d. f objects, multiple datasets • New class Roo. Workspace provides basic infrastructure for complex project management – Container class for p. d. fs, datasets, functions etc… – Controlled interface: cannot insert duplicates with same name. – Automatic reconnects: if a pdf f(x, p) is inserted an internal Roo. Real. Var x already exists, the copy that is inserted is automatically connected to the copy in the workspace – Tools for conflict resolution on insertion: Can rename nodes on the fly upon inserted: Roo. Workspace: : import(pdf, Rename. Conflict. Nodes(“_v 2”)) ; – Tools for variable renaming on insertion Roo. Workspace: : import(pdf, Rename. Variable(“x”, ”y”)) ; Wouter Verkerke, NIKHEF

New concept – Roo. Workspace • New Roo. Workspace can be persisted entirely –

New concept – Roo. Workspace • New Roo. Workspace can be persisted entirely – Allows to save p. d. fs in addition to data • Important new concept – Sharing data is between individual physicists, working groups, or experiments is relatively easy – ROOT TTrees, THx histograms almost universal standard – Sharing functions (likelihood / probability density) generally much more difficult due to lack of common language – Roo. Fit makes sharing (probability density) functions very easy: functions can be persisted in ROOT files (NEW) • Many potential benefits – Easy sharing of results, ideas – Simplifies cross checks, debugging and result combinations – Combined fits for CP parameters easily executed by combining likelihood from multiple workspaces Wouter Verkerke, NIKHEF

Persistence of models • Elementary use case Roo. Abs. Pdf& g ; // any

Persistence of models • Elementary use case Roo. Abs. Pdf& g ; // any p. d. f you made Roo. Abs. Data& d ; // any data you made Create the workspace container object Roo. Workspace w(“w”, ”my workspace”) ; w. import(g) ; // import p. d. f w. import(d) ; // import data Use standard ROOT I/O to store wspace TFile f(“myresult. root”, ”RECREATE”) ; w. Write() ; f. Close() ; • Both data and p. d. f. are now stored in file! • Works for p. d. f. s of arbitrary complexity, e. g. complicated fit with multiple side bands, full Higgs combination Wouter Verkerke, NIKHEF

A look at the workspace • What is in the workspace? w. Print() ;

A look at the workspace • What is in the workspace? w. Print() ; Roo. Workspace(w) my workspace contents Typed accessors to conveniently retrieve contents variables ----(x, m, s) Roo. Real. Var* x = w. var(“x”) ; p. d. f. s ------Roo. Gaussian: : g[ x=x mean=m sigma=s ] = 0 Roo. Abs. Pdf* g = w. pdf(“g”) ; datasets -------Roo. Data. Set: : d(x) Roo. Abs. Data* d = w. data(“d”) ; Wouter Verkerke, NIKHEF

Using & adapting persisted p. d. f. s. • Using both model & p.

Using & adapting persisted p. d. f. s. • Using both model & p. d. f from file TFile f(“myresults. root”) ; Roo. Workspace* w = f. Get(“w”) ; Make plot of data and p. d. f Fit p. d. f other data outside workspace Alternatively import data in workspace Roo. Plot* xframe = w->var(“x”)->frame() ; w->data(“d”)->plot. On(xframe) ; w->pdf(“g”)->plot. On(xframe) ; // p. d. f. s in workspace work with any data w->pdf(“g”)->fit. To(*my. Data) ; // Naming conflicts or mismatches easily // resolved by importing all objects in wspace w->import(*my. Data, Rename. Variable(“y”, ”x”)) ; Wouter Verkerke, NIKHEF

A more complex example • Combining toy ‘ATLAS’ and ‘CMS’ results from persisted workspaces

A more complex example • Combining toy ‘ATLAS’ and ‘CMS’ results from persisted workspaces Read ATLAS workspace Read CMS workspace Construct combined LH TFile* f = new TFile("atlas. root") ; Roo. Workspace *atlas = f->Get("atlas") ; TFile* f = new TFile("cms. root") ; Roo. Workspace *cms = f->Get("cms") ; Roo. Addition nll. Combi("nll. Combi", "nll CMS&ATLAS", Roo. Arg. Set(*cms->function(“nll”), *atlas->function(“nll”))) ; Construct profile LH in m. Higgs Roo. Profile. LL pll. Combi("pll. Combi", "pll", nll. Combi , *atlas->var("m. Higgs")) ; Plot Atlas, CMS, combined profile LH Roo. Plot* mframe = atlas->var("m. Higgs")->frame(-3. 5, -2. 5) ; atlas->function(“nll”)->plot. On(mframe)) ; cms->function(“nll”)->plot. On(mframe), Line. Style(k. Dashed )) ; pll. Combi. plot. On(mframe, Line. Color(k. Red )) ; mframe->Draw() ; // result on next slide NB: You can publish your actual likelihood in digital form this NIKHEF way Wouter in Verkerke,

ROOT, Roo. Fit & Roo. Stats Roo. Fit is extension to ROOT – (Almost)

ROOT, Roo. Fit & Roo. Stats Roo. Fit is extension to ROOT – (Almost) no overlap with existing functionality Roo. Stats Statistical analysis Neyman construction Bayesian posterior Profile Likelihood Data Modeling Toy. MC data Generation Model Visualization Data/Model Fitting MINUIT C++ command line interface & macros Data management & histogramming I/O support Graphics interface Wouter Verkerke, NIKHEF

The Roo. Stats project – common statistics tools for LHC • Initiative by Rene

The Roo. Stats project – common statistics tools for LHC • Initiative by Rene & Kyle to organize suite of common tools in ROOT – Propose to build tools on top Roo. Fit following survey of existing software and user community – Idea to have few core developers maintaining the framework and have mechanism for users/collaborations to contribute concrete tools – Necessary groundwork in Roo. Fit for support of Roo. Stats mostly done • What should be in there? – There are few major classes of statistical techniques: – Likelihood: All inference from likelihood curves – Bayesian: Use prior on parameter to compute P(theory|data) – Frequentist: Restricted to statements of P(data|theory) • Even within one of these classes, there are several ways to approach the same problem. – Aim to collect them all in one set of consistent tools Wouter Verkerke, NIKHEF

Designing the framework • Kyle & I met early 2007 to discuss how to

Designing the framework • Kyle & I met early 2007 to discuss how to implement a few statistical concepts on top of Roo. Fit – want class structure to maps onto statistical concepts – Successfully worked out a few of the methods • The first examples were – Bayesian Posterior – Profile likelihood ratio – Acceptance Regio – Ordering Rule – Neyman Construction – Confidence Interval • Many concepts already have an appropriate class in Roo. Fit – New Roo. Workspace class key component of interface Wouter Verkerke, NIKHEF

Roo. Stats progress • Kyle has done several successful pilot studies to test out

Roo. Stats progress • Kyle has done several successful pilot studies to test out feasibility of concept – E. g. multi-channel Higgs sensitivity study • Now starting with construction of concrete tools – First candidate is real world Tevatron example with input from Tom Jun – Aiming for first functional release in course of 5. 18 (spring 2008) Wouter Verkerke, NIKHEF

Roo. Fit Developments & Future plans – Overview • Quite a bit of new

Roo. Fit Developments & Future plans – Overview • Quite a bit of new code developed in 2007, with more to come in 2008. Will cover this later • Manpower – Interest for and use of Roo. Fit in ATLAS, CMS, LHCb is increasing. – I continue to develop and support Roo. Fit at ~10 -20% level (which has been support level since 5 years). – I intend to continue at this level for the foreseeable future. • Access to code, bundling with ROOT – Development copy of Roo. Fit moved from Source. Forge to ROOT Sub. Version repository. Simplifies updates to ROOT – ROOT Sub. Version allows me to easily make development branches – Intend to make use of more ROOT/CERN facilities for support – File your bug requests in the ROOT Savannah tracker – Ask your question on the ROOT forums. Wouter Verkerke, NIKHEF