CMS Roo Stats Higgs Combination Package Giovanni Petrucciani
CMS Roo. Stats Higgs Combination Package Giovanni Petrucciani (UCSD) 17. 05. 2011 1
the combination package • A package that builds an executable that can be run to compute limits/signif. combine input –M method [options] • Input can be a text “datacard” or an arbitrary Roo. Stats Workspace saved in a ROOT file. • Internally, it has two main components – text 2 workspace: A python program that reads datacards and creates a Roo. Stats model – The part that deals with the statistical methods 17. 05. 2011 2
Datacards • Started from a simple format for counting experiments: a text table with observed events and expected yields in each channel. • Progressively enhanced it: – Support for more pdfs for systematic uncertainties (e. g. Gammas, asymm errors, . . . ) – Use of shapes instead of just counting: • Binned shapes: plain ROOT histograms • Arbitrary shapes (Roo. Abs. Pdf) • Roo. Data. Hist and Roo. Data. Set (or just TTrees) 17. 05. 2011 3
Simple datacards example # one channel, we observe 0 events bin 1 observation 0 ------# expected events for signal and backgrounds bin 1 1 process ggh 4 G Bckg process 0 1 rate 4. 76 1. 47 ------delta. S ln. N 1. 20 20% unc. on signal delta. B ln. N 1. 50 50% unc. on bkg. 17. 05. 2011 4
Complex counting experiment bin e_tau mu_tau e_mu Multiple channels observation 517 540 101 ---------------------------bin e_tau mu_tau e_mu process higgs ZTT QCD higgs ZTT other process 0 1 2 rate 0. 34 190 327 0. 57 329 259 0. 15 88 14 ---------------------------lumi ln. N 1. 11 Treating of individ. tauid ln. N 1. 23 backgrounds Zto. LL ln. N 1. 04 effic ln. N 1. 04 1. 04 QCDel ln. N 1. 20 Correlated -effects. QCDmu ln. N 1. 10 of systematics other ln. N 1. 1 17. 05. 2011 Systematics with names, to allow combination of datacards 5
From Counting to Shapes • For number countings, the Likelihood in each channel is constructed as sum over all the contributing physics processes N(exp) = sum( N(exp, proc), … ) pdf = Poisson(N(obs) | N(exp)) full pdf = PROD(pdf(channel 1), pdf(channel 2), …) • The extension to shapes is trivial pdf = Roo. Add. Pdf( N(exp, proc) * shape(proc), … ) full pdf = Roo. Simultaneous( pdf 1, pdf 2, … ) 17. 05. 2011 6
Shape Uncertainties (1) Vertical Morphing: • Using linear or quadratic interpolation between 3 templates. • Gaussian contraint on interpolation parameter. • Works on any pdf: histo, keys, parametric • Gives wrong results if the deformation is large (e. g. for a gaussian if the uncertainty on the mean is comparable with the peak width) 17. 05. 2011 7
Shape Uncertainties (2) Simple parametric uncertainties: • Shape parameter used as nuisance, with a constraint (Gaussian or Bifurcated Gaussian • Relies on the user to select a physics-oriented parametrization in which nuisances are uncorrelated. 17. 05. 2011 8
Shape uncertainties (3) Parametric interpolation: • Templates are created, e. g. from different MCs • Same parametrization is used to fit all templates, and save two additional Roo. Arg. Sets of parameters for each systematic • Shapes are obtained interpolating linearly the parameters between the Roo. Arg. Sets (still under developed, not yet there) 17. 05. 2011 9
Combined Datasets • If all channels are counting experiments, make a Roo. Data. Set with one column per channel • If all channels have all inputs as TH 1 s (data and templates), convert into Roo. Data. Hist all on the same dummy variable. Histos for channels that have less bins get padded with empty bins. • If at least one channel has a Roo. Data. Set dataset, try to make a combined Roo. Data. Set • Otherwise try to make a combined Roo. Data. Hist 17. 05. 2011 10
Statistical tools proper What we do on top of Roo. Stats: • Configure Roo. Stats components through command-line args and sensible defaults • Extend functionalities of some tools, or provide workarounds to bugs for which the fix is not in yet in CMSSW’s build of Roo. Stats • Provide other common infrastructure, e. g. – Running toys to get expected limits – Saving results to root files 17. 05. 2011 11
Zoo of statistical methods • Asymptotic limits and significances from profiled likelihood • Bayesian limits • Various flavours of Hybrid Frequentist. Bayesian limits (“CLs”, . . . ) and significances • Feldman-Cousins bands • Computing both observed limits and expected limits with 68%/95% bands by running toys. 17. 05. 2011 12
Other tools Other pieces: • Tools to combine datacards • Tools to run the combination across the GRID using CRAB (the CMS grid wrapper). • Plotting tools: basic functionality exists, but something better is under work 17. 05. 2011 13
What’s higgs specific? • Datacard format: generic “excess of signal on top of some background” problem. • Statistical tools even more generic than this. • Main limitations: – Most things work only when setting a limit in a single parameter of interest – Parameter should behave like a cross section (positive definite; zero = no signal) • Limitations driven by lack of need for something even more general; can be overcome if needed. 17. 05. 2011 14
CMS dependencies • text 2 workspace: – Asym. Pow for asymmetric log-normals – Vertical. Interp. Pdf for vertical morphing • combination tool: – hypo test inversion with new Hybrid. Calculator – optimized Test. Statistics classes – some helper functions e. g. to factorize pdfs 17. 05. 2011 15
External dependencies • text 2 workspace: – python libraries (re, optparse) + Py. ROOT • combination tool: – c++ std library (io, stl, auto_ptr, exceptions) – boost: : program_options used to parse commandline args and configure methods – boost: : filesystem to do a “rm –r” of demporary dir – some system-level stuff (e. g. fork, tempnam) 17. 05. 2011 16
- Slides: 16