BiasFree Estimation in Multicomponent Maximum Likelihood Fits with

Bias-Free Estimation in Multicomponent Maximum Likelihood Fits with Component-Dependent Templates Pierluigi Catastini I. N. F. N. - Pisa and Siena University Giovanni Punzi S. N. S. and I. N. F. N - Pisa PHYSTAT 05 P. Catastini

Problem • Suppose we have a sample of particles generated by a certain physics process produced by our experiment. • Suppose we know that the sample is a mixture of different particle types, for example, Pions, Protons and Kaons, but the proportion of each particle type is completely unknown. • Of course, our experiment is also equipped with some kind of Particle IDentification (PID) device, providing the measurement of some quantity related to the particle type. • We want to measure the fractions of each particle type : f , fp, fk. PHYSTAT 05 P. Catastini

$A “Real Life” Problem* … • Measuring the particle type fractions is common in$

A “Real Life” Problem* … • Measuring the particle type fractions is common in particle physics: e. g. understanding the particle produced during the fragmentation of the B mesons (flavor tagging), separating different particle decays. . . • Usually PID information provided by energy loss of charged particle in gas (d. E/dx), measurement of Time of Flight, Cherenkov light… • Solution obtained performing an unbinned Maximum Likelihood Fit. But remember… The mean of the PID observable Electrons strongly depends on particle Muons Protons momentum (which is an additional Kaons observable, known event-by-event): Component Dependent Templates ! PHYSTAT 05 P. Catastini * At least for a Physicist… Pions

Please write the Likelihood ! • Unfortunately, the Likelihood is not simply: • i ( fi P(pidi| , Momi) (WRONG!) • Using the above, you may get strongly biased results if the additional observables have different distributions [1]. • The reason for the failure is, quoting from [1]: ü “Whenever the templates used in a multi-component fit depend on additional observables, one should always use the correct, complete Likelihood expression, including the explicit distributions of all observables for all classes of events“ • In our problem, the above means that we need to include the momentum distributions of each particle type (they are almost always different in practice). [1]PHYSTAT physics/0401045 (G. Punzi, P. PHYSTAT 03) 05 Catastini

Writing the Likelihood… • Particle IDentification information is represented by a certain observable called pid; we than write the likelihood as: L (f f. P f. K ) = i ( f P(pidi, Momi | ) + f. P P(pidi, Momi | P) + (1 - f. P )P(pidi, Momi | K) ) = i ( j fj P(pidi | Momi , typej) P(Momi | typej) ) PHYSTAT 05 Given: ü f + f. P + f. K = 1 ü j = Pion, Proton, Kaon P. Catastini

A toy study of the “Real Life” Problem • We generate a sample with known particle types composition as follow: ü PID variable is distributed according a typical resolution function (i. e. the template used in the fit) defined as PIDmes - PIDexp(mom): P(pid i | Momi , atype j) ü Momenta are distributed according gaussian N( , ) : P(Momi | typej) = 1. 00 Ge. V/c P = 1. 25 Ge. V/c K = 1. 50 Ge. V/c = 0. 25 Ge. V PHYSTAT 05 K = 0. 25 Ge. V f = 0. 50 f. P = 0. 15 f. K = 0. 35 P. Catastini Momentum (Ge. V/c)

Result of the Fits OK ! Pions OK ! Protons If we wouldn’t take into account the momentum distributions… Pions Protons Bias ! PHYSTAT 05 P. Catastini

Often in “Real Life”… • Writing the complete likelihood with all observables distribution is almost straightforward. Of course, provided the assumption you can easily obtain a parameterization of those distribution… • Often we have poor information about those distribution (barely acceptable, after a very hard work!), sometimes they could be even completely unknown. • If, for example, the goal of the particle type fit we have been performing in the previous slides is to estimate the fractions of particle produced during the heavy quarks fragmentation… Grate! We have no idea about the functional form of each particle type’s momentum distribution. How can we write the correct Likelihood ? PHYSTAT 05 P. Catastini

A solution • No functional form is known in order to parameterize the missing P(Mom | type). • Use a general functional form • Series Expansion • P(Mom | typej) = m amj Fm(Mom) with amj free parameters of the fit • We decide to use Orthogonal Polynomials, among them: ü Legendre Polynomials Pi ü First type Chebyshev Polynomials Ti ü Second type “ “ Ui ü Lagerre Polynomials Li 05 P. Catastini üPHYSTAT Hermite Polynomials Hi [-1, 1] [0, + ] [- , + ] Used from 0 th to 6 th term.

Our toy • Replacing the exact distribution N( , ) with m amj Fm(Mom) for each particle type, we fitted again our toy sample: OK ! Pions OK ! • The Bias is again corrected ! PHYSTAT 05 P. Catastini Protons

Some Comments • Of course, we are happy: although we didn’t know a priori the P(Mom | type) we have been able to avoid the bias. • Please, notice that resolution on the parameter is not degraded a lot ! • Just 7 terms of the series expansion were used! Not so many. Projections of P(Mom | typej) = m amj Fm(Mom) : Pions Protons Kaons PHYSTAT 05 P. Catastini Momentum (Ge. V/c)

Another Complication • Suppose our PID information is obtained by the measurement of the Time Of Flight (TOF). • The expression of the Expected TOF is a function of 2 obsevables : TOFexp = arclength / c sqrt(1 + mj 2/Mom 2) • It means that our pdf is (after having verified that the correlation between arclength and momentum is almost zero) : P(Mom, Arcl, pid |typej, ) P(pid | Mom, Arc, typej) * P(Mom | typej ) * Both unknown ! P(Arc | typej) • We want to apply the same technique of series expansion both for momentum and arclength ! PHYSTAT 05 P. Catastini

Back to our toy • Fractions, pid and momentum variables generated as before • Arclength distributed according a gaussian N( , ) ü = K = P Same distribution for all particle types but in principle you don’t know ! L (fj, amj, blj) = i ( j fj ( P(pidi | Momi , Arci , typej) m amj Fm(Momi) l blj Fl(Arci)) ) • Again we used 7 terms for the momenta series expantion • We used 3 terms for the arclegth series expantion PHYSTAT 05 P. Catastini

Results OK ! Pions OK ! Projections of P(Arc | typej) = l blj Fl(Arc) : Pions Protons Kaons PHYSTAT 05 P. Catastini Protons

Conclusions • We faced a common problem of particle physics were the incomplete Likelihood expression is cause of a detectable bias. We had cure it ! • The proposed problem has also the complication of the lack of information about the distribution of an observable! • We solved the problem, removing the bias in the fit results, including series expansion as a parameterization of the unknown distributions (coefficients free parameters determined by the fit). • We even faced the case where two observables have unknown distributions. Again we used two different series expansions in order to parameterize those distribution and avoid the bias. PHYSTAT 05 P. Catastini