Probability and Statistics Many of the process involved

Probability and Statistics Many of the process involved with detection of particles are statistical in nature Number of ion pairs created when proton goes through 1 cm of gas Energy lost by an electron going through 1 mm of lead The understanding and interpretation of all experimental data depend on statistical and probabilistic concepts: “The result of the experiment was inconclusive so we had to use statistics” how do we extract the best value of a quantity from a set of measurements? how do we decide if our experiment is consistent/inconsistent with a given theory? how do we decide if our experiment is internally consistent? how do we decide if our experiment is consistent with other experiments? Definition of probability: Lets define probability by example: Suppose we have N trials and a specified event occurs r times. For example the trial could be rolling a dice and the event could be rolling a 6. We define the probability (P) of an event (E) occurring as: P(E) = r/N when N ®¥ Examples: coin toss P(heads) = 0. 5 six sided dice P(6) = 1/6 (P(1) = P(2) = P(3) = P(4) = P(5) = P(6) for “honest” die) Remember: P(heads) should approach 0. 5 the more times you toss the coin. Obviously for a single coin toss we can never get P(heads) = 0. 5! 880. A 20 Winter 2002 Richard Kass

Probability and Statistics By definition probability is a non-negative real number bounded by 0£ P £ 1 If P = 0 then the event never occurs If P = 1 then the event always occurs Çºintersection, Èº union Events are independent if: P(AÇB)=P(A)P(B) Events are mutually exclusive (disjoint) if: P(AÇB)=0 or P(AÈB)= P(A)+P(B) The sum (or integral) of all probabilities if they are mutually exclusive must = 1. Probability can be a discrete or a continuous variable. In the discrete case only certain values of P are allowed. example of discrete case: tossing a six-sided dice. P(xi) = Pi here xi = 1, 2, 3, 4, 5, 6 and Pi = 1/6 for all xi. another example is tossing a coin. Only 2 choices, heads or tails. For both of the above discrete examples (and in general) when we sum over all mutually exclusive possibilities: 880. A 20 Winter 2002 Richard Kass

Probability and Statistics Continuous Probability: In this case P can be any number between 0 and 1. We can define a “probability density function”, pdf, with a a continuous variable The probability for x to be in the range a £x £ b is: Just like the discrete case the sum of all probabilities must equal 1. For the continuous case this means: We say that f(x) is normalized to one. NOTE: The probability for x to be exactly some number is zero since: Aside: Probability theory is an interesting branch of mathematics. Calculus of Probabilities º set theory. 880. A 20 Winter 2002 Richard Kass

Probability and Statistics Examples of some common P(x)’s and f(x)’s: Discrete = P(x) Continuous = f(x) binomial uniform, i. e. = constant Poisson Gaussian exponential chi square How do we describe a probability distribution? mean, mode, median, and variance For a continuous distribution these quantities are defined by: For discrete distribution the mean and variance are defined by: 880. A 20 Winter 2002 Richard Kass

Probability and Statistics Some continuous pdfs. v=1 ÞCauchy (Breit-Wigner) v=¥ Þgaussian Chi-square distribution 880. A 20 Winter 2002 u Student t distribution Richard Kass

Probability and Statistics We use results from probability and statistics as a way of indicating how “good” a measurement is. The most common quality indicator is relative precision. Relative precision = [uncertainty of measurement]/measurement Uncertainty in measurement is usually square root of variance: s = standard deviation Example: we measure table to be 10 inches with uncertainty of 1 inch. the relative precision is 1/10=0. 1 or 10% (% relative precision) s is usually calculated using the technique of “propagation of errors”. However this s is not what most people think it is! We will discuss this in more detail soon. 880. A 20 Winter 2002 Richard Kass

Probability and Statistics Some comments on accuracy and precision: Accuracy: The accuracy of an experiment refers to how close the experimental measurement is to the true value of the quantity being measured. Precision: This refers to how well the experimental result has been determined, without regard to the true value of the quantity being measured. Note: Just because an experiment is precise it does not mean it is accurate!! The above figure shows various measurements of the neutron lifetime over the years. Note the big jump downward in the 1960’s. Are any of these measurements accurate? 880. A 20 Winter 2002 Richard Kass

Binomial Probability Distributions For the binomial distribution P is the probability of m successes out of N trials. Here p is probability of a success and q=1 -p is probability of a failure. Tossing a coin N times and asking for m heads is a binomial process. 880. A 20 Winter 2002 Richard Kass

Binomial Probability Distribution What’s the variance of a binomial distribution? Using a trick similar to the one used for the average we find: Note: se, the “error in the efficiency” ® 0 as e® 0 or e ® 1. 880. A 20 Winter 2002 Richard Kass

Binomial Probability Distributions 880. A 20 Winter 2002 Richard Kass

Poisson Probability Distribution The number of counts in a time interval is a Poisson process. 880. A 20 Winter 2002 Richard Kass

Poisson Probability Distribution Not much difference between them here! Comparison of Binomial and Poisson distributions with mean m=1. 880. A 20 Winter 2002 Richard Kass

Gaussian Probability Distribution Plot of gaussian pdf P(x) x It is very unlikely (<0. 3%) that a measurement taken at random from a gaussian pdf will be more than ± 3 s from the true mean of the distribution. 880. A 20 Winter 2002 Richard Kass

Central Limit Theorem Why is the gaussian pdf so important ? Actually, the Y’s can be from different pdf’s! For CLT to be valid: m and s of pdf must be finite No one term in sum should dominate the sum 880. A 20 Winter 2002 Richard Kass

Central Limit Theorem Best illustration of the CLT. a) Take 12 numbers (ri) from your computer’s random number generator b) add them together c) Subtract 6 d) get a number that is from a gaussian pdf ! Computer’s random number generator gives numbers distributed uniformly in the interval [0, 1] A uniform pdf in the interval [0, 1] has m=1/2 and s 2=1/12 A) 5000 random numbers Thus the sum of 12 uniform random numbers minus 6 is distributed as if it came from a gaussian pdf with m=0 and s=1. C) 5000 triplets (r 1+ r 2+ r 3) of random numbers B) 5000 pairs (r 1+ r 2) of random numbers D) 5000 12 -plets (r 1+ ++r 12) of random numbers. E) 5000 12 -plets E (r + ++r -6) of 1 12 random numbers. Gaussian m=0 and s=1 -6 880. A 20 Winter 2002 0 +6 Richard Kass

$Propagation of errors Suppose we measure the branching fraction BR(D 0®p+p -) using the$

Propagation of errors Suppose we measure the branching fraction BR(D 0®p+p -) using the number of produced D 0 mesons (Nproduced), the number of D 0®p+p - decays found (Nfound), and the efficiency for finding a D 0®p+p – decay (e). BR(D 0®p+p -)=Nfound/(e. Nproduced), If we know the uncertainties (s’s) of Nproduced, Nfound, and e what is the uncertainty on BR(D 0®p+p -) ? 880. A 20 Winter 2002 Richard Kass

Propagation of errors 880. A 20 Winter 2002 Richard Kass

Propagation of errors Example: Error in BR(D 0®p+p – ). Assume: Nproduced =106 ± 103, Nfound =10 ± 3, e = 0. 02 ± 0. 002 880. A 20 Winter 2002 Richard Kass

Propagation of errors Example: The error in the average. The average of several measurements each with the same uncertainty (s) is given by: “error in the mean” This is a very important result! It says that we can determine the mean better by combining measurements. Unfortunately, the precision only increases as the square root of the number of measurements. Do not confuse sm with s! s is related to the width of the pdf (e. g. gaussian) that the measurements come from. It does not get smaller as we combine measurements. A slightly more complicated problem is the case of the weighted average or unequal s’s: Using same procedure as above we obtain: 880. A 20 Winter 2002 “error in the weighted mean” Richard Kass

Propagation of errors Problems with Propagation of Errors: In calculating the variance using propagation of errors we usually assume that we are dealing with Gaussian like errors for the measured variable (e. g. x). Unfortunately, just because x is described by a Gaussian distribution does not mean that f(x) will be described by a Gaussian distribution. 100 y = 2 x with x = 10 ± 2 d. N/dy 80 sy = 2 s x= 4 60 Start with a gaussian with m=10, s=2. Get another gaussian with m=20, s= 4 40 20 0 880. A 20 Winter 2002 0 10 20 y 30 40 Richard Kass

Propagation of errors Example when the new distribution is non-Gaussian: Let y = 2/x The transformed probability distribution function for y does not have the form of a Gaussian pdf. 100 y = 2/x with x = 10 ± 2 d. N/dy 80 s y= 2 sx / x 2 Start with a gaussian with m=10, s=2. DO NOT get another gaussian ! Get a pdf with m = 0. 2, s = 0. 04. This new pdf has longer tails than a gaussian pdf. 60 40 Prob(y>my+5 sy) =5 x 10 -3, for gaussian » 3 x 10 -7 20 0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 y Unphysical situations can arise if we use the propagation of errors results blindly! Example: Suppose we measure the volume of a cylinder: V = p. R 2 L. Let R = 1 cm exact, and L = 1. 0 ± 0. 5 cm. Using propagation of errors we have: s. V = p. R 2 s. L = p/2 cm 3. and V = p ± p/2 cm 3 However, if the error on V (s. V) is to be interpreted in the Gaussian sense then the above result says that there’s a finite probability (≈ 3%) that the volume (V) is < 0 since V is only two standard deviations away from than 0! Clearly this is unphysical ! Care must be taken in interpreting the meaning of s. V. 880. A 20 Winter 2002 Richard Kass

Maximum Likelihood Method (MLM) Does this procedure make sense? The MLM answers this question and provides a method for estimating parameters from existing data. 880. A 20 Winter 2002 Richard Kass

Maximum Likelihood Method (MLM) Average ! 880. A 20 Winter 2002 Richard Kass

Maximum Likelihood Method (MLM) Average ! Cramer-Rao bound 880. A 20 Winter 2002 Richard Kass

Maximum Likelihood Method (MLM) How do we calculate errors (s’s) using the MLM? Start by looking at the case where we have a gaussian pdf. The likelihood function is: It is easier to work with ln. L: If we take two derivatives of ln. L with respect to a we get: For the case of a gaussian pdf we get the familiar result: The big news here is that the variance of the parameter of interest is related to the 2 nd derivative of the likelihood function. Since our example uses a gaussian pdf the result is exact. More important, the result is asymptotically true for ALL pdf’s since for large samples (n®¥) all likelihood functions become “gaussian”. 880. A 20 Winter 2002 Richard Kass

Maximum Likelihood Method (MLM) The previous example was for one variable. We can generalize the result to the case where we determine several parameters from the likelihood function (e. g. a 1, a 2, … an): Here Vij is a matrix, (the “covariance matrix” or “error matrix”) and it is evaluated at the values of (a 1, a 2, … an) that maximize the likelihood function. In practice it is often very difficult or impossible to evaluate the 2 nd derivatives. The procedure most often used to determine the variances in the parameters relies on the property that the likelihood function becomes gaussian (or parabolic) asymptotically. We expand ln. L about the ML estimate for the parameters. For the one parameter case we have: Since we are evaluating ln. L at the value of a (=a*) that maximizes L, the term with the 1 st derivative is zero. Using the expression for the variance of a on the previous page and neglecting higher order terms we find: Thus we can determine the ± ns limits on the parameters by finding the values where ln. L decreases by n 2/2 from its maximum value. 880. A 20 Winter 2002 Richard Kass This is what MINUIT does

Maximum Likelihood Method (MLM) Example: MLM and determining slope and intercept of a line Assume we have a set of measurements: (x 1, y 1±s 1), (x 2, y 2±s 2), … (xn, yn±sn) and the points are thought to come from a straight line, y=a+bx, and the measurements come from a gaussian pdf. The likelihood function is: We wish to find the a and b that maximizes the likelihood function L. Thus we need to take some derivatives: We have to solve the two equations for the two unknowns, a and b. We can get an exact solution since these equations are linear in a and b. 880. A 20 Winter 2002 Richard Kass

Chi-Square (c 2) Distribution Chi-square (c 2) distribution: Assume that our measurements (xi±si’s) come from a gaussian pdf with mean =m. Define a statistic called chi-square: It can be shown that the pdf for c 2 is: This is a continuous pdf. It is a function of two variables, c 2 and n = number of degrees of freedom. (G = "Gamma Function“) A few words about the number of degrees of freedom n: n = # data points - # of parameters calculated from the data points Reminder: If you collected N events in an experiment and you histogram your data in n bins before performing the fit, then you have n data points! For n ³ 20, P(c 2>y) can be approximated using a gaussian pdf with y=(2 c 2) 1/2 -(2 n-1)1/2 EXAMPLE: You count cosmic ray events in 15 second intervals and sort the data into 5 bins: number of intervals with 0 cosmic rays 2 number of intervals with 1 cosmic rays 7 c 2 distribution for different degrees of freedom v number of intervals with 2 cosmic rays 6 number of intervals with 3 cosmic rays 3 number of intervals with 4 cosmic rays 2 RULE of THUMB Although there were 36 cosmic rays in your sample you have only 5 data points. A good fit has EXAMPLE: We have 10 data points with m and s the mean and standard deviation of the data set. If we calculate m and s from the 10 data point then n = 8 c 2/DOF £ 1 If we know m and calculate s OR if we know s and calculate m then n = 9 If we know m and s then n = 10 880. A 20 Winter 2002 Richard Kass

MLM, Chi-Square, and Least Squares Fitting Assume we have n data points of the form (yi, si) and we believe a functional relationship exists between the points: y=f(x, a, b…) In addition, assume we know (exactly) the xi that goes with each yi. We wish to determine the parameters a, b, . . A common procedure is to minimize the following c 2 with respect to the parameters: If the yi’s are from a gaussian pdf then minimizing the c 2 is equivalent to the MLM. However, often times the yi’s are NOT from a gaussian pdf. In these instances we call this technique “c 2 fitting” or “Least Squares Fitting”. Strictly speaking, we can only use a c 2 probability table when y is from a gaussian pdf. However, there are many instances where even for non-gaussian pdf’s the above sum approximates c 2 pdf. From a common sense point of view minimizing the above sum makes sense regardless of the underlying pdf. 880. A 20 Winter 2002 Richard Kass

Least Squares Fitting Example: Leo’s 4. 8 (P 107) The following data from a radioactive source was taken at 15 s intervals. Determine the lifetime (t) of the source. The pdf that describes radioactivity (or the decay of a charmed particle) is: Technically the pdf is |d. N(t)/(N(0)dt)| =N(t)/(N(0)t). As written the above pdf is not linear in t. We can turn this into a linear problem by taking the natural log of both sides of the pdf. We can now use the methods of linear least squares to find D and then t. In doing the LSQ fit what do we use to weight the data points ? The fluctuations in each bin are governed by Poisson statistics: s 2 i=Ni. However in this problem the fitting variable is ln. N so we must use propagation of errors to transform the variances of N into the variances of ln. N. Leo has a “ 1” here 880. A 20 Winter 2002 Richard Kass

Least Squares Fitting The slope of the line is given by: Line of “best fit” Thus the lifetime (t) = -1/D = 110. 7 s The error in the lifetime is: Caution: Leo has a factor of ½ in his error matrix (V-1)ij, Eq 4. 72. He minimizes: Using MLM we minimized: t = 110. 7 ± 12. 3 sec. Note: fitting without weighting yields: t=96. 8 s. 880. A 20 Winter 2002 Richard Kass

Hypothesis testing The goal of hypothesis testing is to set up a procedure(s) to allow us to decide if a model is acceptable in light of our experimental observations. Example: A theory predicts that BR(B®p+p -)= 2 x 10 -5 and you measure (4± 2) x 10 -5. The hypothesis we want to test is “are experiment and theory consistent? ” Hypothesis testing does not have to compare theory and experiment. Example: CLEO measures the Lc lifetime to be (180 ± 7)fs while SELEX measures (198 ± 7)fs. The hypothesis we want to test is “are the lifetime results from CLEO and SELEX consistent? ” There are two types of hypotheses tests: parametric and non-parametric Parametric: compares the values of parameters (e. g. does the mass of proton = mass of electron ? ) Non-parametric: deals with the shape of a distribution (e. g. is angular distribution consistent with being flat? ) Consider the case of neutron decay. Suppose we have two theories that both predict the energy spectrum of the electron emitted in the decay of the neutron. Here a parametric test might not be able to distinguish between the two theories since both theories might predict the same average energy of the emitted electron. However a non-parametric test would be able to distinguish between the two theories as the shape of the energy spectrum differs for each theory. 880. A 20 Winter 2002 Richard Kass

Hypothesis testing A procedure for using hypothesis testing a) b) c) d) e) Measure (or calculate) something Find something that you wish to compare with your measurement (theory, experiment) Form a hypothesis (e. g. my measurement is consistent with the PDG value) Calculate the confidence level that the hypothesis is true Accept or reject the hypothesis depending on some minimum acceptable confidence level Problems with the above procedure a) b) c) What is a confidence level ? How do you calculate a confidence level? What is an acceptable confidence level ? How would we test the hypothesis “the space shuttle is safe? ” Is 1 explosion per 10 launches safe? Or 1 explosion per 1000 launches? A working definition of the confidence level: The probability of the event happening by chance. Example: Suppose we measure some quantity (X) and we know that it is described by a gaussian pdf with m=0 and s=1. What is the confidence level for measuring X ³ 2 (i. e. ³ 2 s from the mean)? Thus we would say that the confidence level for measuring X ³ 2 is 0. 025 or 2. 5% and we would expect to get a value of X ³ 2 one out of 40 tries if the underlying pdf is gaussian. 880. A 20 Winter 2002 Richard Kass

Hypothesis testing A few cautions about using confidence limits a) You must know the underlying pdf to calculate the limits. Example: suppose we have a scale of known accuracy (s = 10 gm ) and we weigh something to be 20 gm. Assuming a gaussian pdf we could calculate a 2. 5% chance that our object weighs ≤ 0 gm? ? We must make sure that the probability distribution is defined in the region where we are trying to extract information. b) What does a confidence level really mean? Classical vs Baysian viewpoints Example: Suppose we measure a value of x for the mean of a Gaussian distribution with an unknown mean m. Suppose we know the standard deviation (s) of the distribution. It is tempting to say: “The probability that m lies in the interval [x-2 s, x+2 s] = 95%” However according to Classical probability this is a meaningless statement! By definition the mean (m) is a constant, not a random variable, thus m does not have a probability distribution associated with it! What we can say is that we will reject any value of m that gives a probability of ≤ 5% of obtaining our (measured) value of x. Here we are assuming that we are really measuring m. But how do we really know what we are measuring? 880. A 20 Winter 2002 Richard Kass

Hypothesis testing for gaussian variables: We wish to test if a quantity we have measured (m=average of n measurements ) is consistent with a known mean (m 0). 880. A 20 Winter 2002 Richard Kass

Hypothesis testing Do charge 2/3 quarks exist? Another variation of the quark problem 880. A 20 Winter 2002 Richard Kass

Hypothesis testing Tests when both means are unknown but come from a gaussian pdf: n and m are the number of measurements for each mean Example: Do two experiments agree with each other? CLEO measures the Lc lifetime to be (180 ± 7)fs while SELEX measures (198 ± 7)fs. Thus 7% of the time we should expect the experiments to disagree at this level. But, is this acceptable agreement? 880. A 20 Winter 2002 Richard Kass

Hypothesis testing A non-gaussian example, Poisson distribution The following is the numbers of neutrino events detected in 10 second intervals by the IMB experiment on 23 February 1987 around which time the supernova S 1987 a was first seen by experimenters: #events 0 1 2 3 4 5 6 7 8 9 #intervals 1024 860 307 58 15 3 0 0 0 1 Assuming the data is described by a Poisson distribution. calculate the average and compute the average number events expected in an interval. l= 0. 777 if we include interval with 9 events We can calculate a c 2 assuming the data are described by a Poisson distribution: The predicted number of intervals is given by: Note: we use s 2=prediction for a Poisson #events 0 1 2 3 4 5 6 7 8 9 #intervals 1064 823 318 82 16 2 0. 3 0. 003 0. 0003 There are 7 (= 9 -2) DOF’s here and the probability of c 2/D. O. F. = 3. 6/7 is high (≈80%), indicating a good fit to a Poisson 880. A 20 Winter 2002 Richard Kass

Confidence Intervals Confidence intervals (CI) are related to confidence limits (CL). To calculate a CI we assume a CL and find the values of the parameters that give us the CL. Caution CI’s are not always uniquely defined. We usually seek the minimum interval or symmetric interval. Example: Assume we have a gaussian pdf with m=3 and s=1. What is the 68% CI ? We need to solve the following equation: Here G(x, 3, 1) is the gaussian pdf with m=3 and s=1. There are infinitely many solutions to the above equation. We seek the solution that is symmetric about the mean (m): To solve this problem we either need a probability table, or remember that » 68% of the area of a gaussian is within ±s of the mean. Thus for this problem the 68% CI interval is: [2, 4] Example: Assume we have a gaussian pdf with m=3 and s=1. What is the one sided upper 90% CI ? Now we want to find the c that satisfies: Using a table of gaussian probabilities we find 90% of the area in the interval [-¥, m+1. 28 s] Thus for this problem the 90% CI is: [-¥, 4. 28] 880. A 20 Winter 2002 Richard Kass

Confidence Intervals Suppose an experiment is looking for the X particle but observes no candidate events. What can we say about the average number of X particles expected to have been produced? First, we need to pick a pd. Since events are discrete we need a discrete pd Þ Poisson. Next, how unlucky do you want to be ? It is common to pick 10% of the time to be unlucky. We can now re-state the question as: “Suppose an experiment finds zero candidate events. What is the 90% CL upper limit on the average number of events (m) expected assuming a Poisson pd ? ” Thus we need to solve for m in the following equation: In practice it is much easier to solve for 1 -CL: So, if m=2. 3 then 10% of the time we should expect to find 0 candidates. There was nothing wrong with our experiment. We were just unlucky. For our example, CL=0. 9 and therefore m=2. 3 events. Example: Suppose an experiment finds one candidate event. What is the 95% CL upper limit on the average number of events (m) ? The 5% includes 1 AND 0 events. Here we are saying that we would get 2 or more events 95% of the time if m=4. 74. PDG 1994 has a good Table (17. 3, P 1280) for these types of problems. 880. A 20 Winter 2002 Richard Kass

Maximum Likelihood Method Example: Exponential decay: Generate events according to an exponential distribution with t 0= 100 Calculate ln. L vs t (time) and find maximum of Ln. L and the points where Ln. L =Lmax-1/2 (“ 1 s points”) -5. 613 10 4 -62 ln. L -5. 613 10 4 -63 -5. 613 10 4 ln. L -64 y = m 3 -(m 0 -m 1)^ 2/(2*m 2^ 2) Value Error m 1 100. 8 0. 013475 m 2 1. 01 0. 0088944 m 3 -56128 0. 034297 Chisq 0. 055864 NA R 0. 99862 NA -65 -66 -5. 613 10 4 -67 0 100 200 t 300 400 500 600 Log-likelihood function for 10 events Ln. L max for t=189 1 s points: (140, 265) L not gaussian 880. A 20 Winter 2002 -5. 613 10 4 97 98 99 100 t 101 102 103 104 Log-likelihood function for 104 events Ln. L max for t=100. 8 1 s points: (99. 8, 101. 8) L is fit by a gaussian Richard Kass

Maximum Likelihood Method Example How do we calculate confidence intervals for our MLM example? For the case of 104 events we can just use gaussian stats since the likelihood function is to a very good approximation gaussian. Thus the “ 1 s points” will give us 68% of the area under gaussian curve, the “ 2 s points” points ~95% of area, etc. Unfortunately, the likelihood function for the 10 event case is NOT approximated by a gaussian. So the “ 1 s points” do not necessarily give you 68% of the area under the gaussian curve. In this case we can calculate a confidence interval about the mean using a Monte Carlo calculation as follows: 1) 2) 3) 4) Generate a large number (e. g. 107) of 10 event samples each sample having a mean lifetime equal to our original 10 event sample (t*=189) For each 10 event sample calculate the maximum of the log-likelihood function (=ti) Make a histogram of the ti’s. This histogram is the pdf for t. To calculate a X% confidence interval about the mean, find the region where X%/2 of the area is in the region [t. L, t*] and X% is in the region [t*, t. H]. NOTE: since the pdf may not be symmetric around its mean, we may not be able to find equal area regions below and above the mean. 880. A 20 Winter 2002 Richard Kass

Maximum Likelihood Method Example Semi-log Linear t* t* Above is the histogram or pdf of 107 ten event samples each with t*=189. By counting events (i. e. integrating) in an interval around t*, the histogram (actually, I printed out the number of events in one unit steps from 0 to 650) gives the following: 54. 9% of the area is in the region (0£ t £ 189) “± 1 s region”: 34% of area in regions (139£ t £ 189) and (189£ t £ 263) 90% CI region: 45% of area in regions (117£ t £ 189)] and (189£ t £ 421) The upper 95% region (i. e. 47. 5% of the area above the mean) is not defined. Very close To likelihood result NOTE: the variance of an exponential distribution can be calculated analytically: Thus for the 10 event sample we expect s= 60, not too far off from the 68% CI! For the 104 event sample, the CI’s from the ML estimate of s and the analytic s are essentially identical (both give s=1. 01). 880. A 20 Winter 2002 Richard Kass

Confidence Regions Often we have a problem that involves two or more parameters. In these instances it makes sense to define confidence regions rather than an interval. Consider the case where we are doing a MLM fit to two variables a, b. Previously we have seen that for large samples the Likelihood function becomes “gaussian”: We can generalize this to two correlated variables a, b: The contours of constant probability are given by: Q=1 contains » 39% of the area Q=2. 3 contains » 68% of the area Q=4. 6 contains » 90% of the area Q=6. 2 contains » 95% of the area Q=9. 2 contains » 99% of the area 880. A 20 Winter 2002 Integral of c 2 pdf with 2 DOF’s can done analytically: Richard Kass

Confidence Regions Example: The CLEO experiment did a maximum likelihood analysis to search for B®p+p- and B®k+ p- events. The results of the MLM fit are: N p+p- =16± 4, N k+ p- =25± 5, r=0. 5 (warning: these are made up numbers!) N p+p- and N k+ p- are highly correlated, since at high momentum (>2 Ge. V) CLEO has a hard time separating p’s and k’s. The contours of constant probability are given by: 99%, Q=9. 2 N p+p- 95%, Q=6. 2 68%, Q=2. 3 · 39%, Q=1 880. A 20 Winter 2002 N k+ p- Richard Kass