DATA ANALYSIS Module Code CA 660 Lecture Block

DATA ANALYSIS Module Code: CA 660 Lecture Block 6: Alternative estimation methods and their implementation

MAXIMUM LIKELIHOOD ESTIMATION • Recall general points: Estimation, definition of Likelihood function for a vector of parameters and set of values x. Find most likely value of = maximise the Likelihood fn. Also defined Log-likelihood (Support fn. S( ) ) and its derivative, the Score, together with Information content per observation, which for single parameter likelihood is given by • Why MLE? (Need to know underlying distribution). Properties: Consistency; sufficiency; asymptotic efficiency (linked to variance); unique maximum; invariance and, hence most convenient parameterisation; usually MVUE; amenable to conventional optimisation methods. 2

VARIANCE, BIAS & CONFIDENCE • Variance of an Estimator - usual form or for k independent estimates • For a large sample, variance of MLE can be approximated by can also estimate empirically, using re-sampling* techniques. • Variance of a linear function (of several estimates) – (common need in genomics analysis, e. g. heritability), in risk analysis • Recall Bias of the Estimator then the Mean Square Error is defined to be: expands to so we have the basis for C. I. and tests of hypothesis. 3

COMMONLY-USED METHODS of obtaining MLE • Analytical - solving or when simple solutions exist • Grid search or likelihood profile approach • Newton-Raphson iteration methods • EM (expectation and maximisation) algorithm N. B. Log. -likelihood, because max. same value as Likelihood Easier to compute Close relationship between statistical properties of MLE and Log-likelihood 4

MLE Methods in outline Analytical : - recall Binomial example earlier • Example : For Normal, MLE’s of mean and variance, (taking derivatives w. r. t mean and variance separately), and equivalent to sample mean and actual variance (i. e. /N), - unbiased if mean known, biased if not. • Invariance : One-to-one relationships preserved • Used: when MLE has a simple solution 5

MLE Methods in outline contd. Grid Search – Computational Plot likelihood or log-likelihood vs parameter. Various features • Relative Likelihood =Likelihood/Max. Likelihood (ML set =1). Peak of R. L. can be visually identified /sought algorithmically. e. g. Plot likelihood and parameter space range - gives 2 peaks, symmetrical around ( likelihood profile for e. g. well-known mixed linkage analysis problem. Or for similar example of populations following known proportion splits). If now constrain MLE solution unique e. g. = R. F. between genes (possible mixed linkage phase). 6

MLE Methods in outline contd. • Graphic/numerical Implementation - initial estimate of . Direction of search determined by evaluating likelihood to both sides of . Search takes direction giving increase, because looking for max. Initial search increments large, e. g. 0. 1, then when likelihood change starts to decrease or become negative, stop and refine increment. Issues: • Multiple peaks – can miss global maximum, computationally intensive ; see e. g. http: //statgen. iop. kcl. ac. uk/bgim/mle/sslike_1. html • Multiple Parameters - grid search. Interpretation of Likelihood profiles can be difficult, e. g. http: //blogs. sas. com/content/iml/2011/10/12/maximum-likelihood -estimation-in-sasiml/ 7

Example in outline • Data e. g used to show a linkage relationship (non-independence) between e. g. marker and a given disease gene, or (e. g. between sex and purchase) of computer games. Escapes = individuals who are susceptible, but show no disease phenotype under experimental conditions: (express interest but no purchase record). So define as proportion of escapes and R. F. respectively. is penetrance for disease trait or of purchasing, i. e. P{ that individual with susceptible genotype has disease phenotype}. P{individual of given sex and interested who actually buys} Purpose of expt. -typically to estimate R. F. between marker and gene or proportion of a sex that purchases • Use: Support function = Log-Likelihood. Often quite complex, e. g. for above example, might have 8

Example contd. • Setting 1 st derivatives (Scores) w. r. t and w. r. t. • Expected value of Score (w. r. t. is zero, (see analogies in classical sampling/hypothesis testing). Similarly for . Here, however, No simple analytical solution, so can not solve directly for either. • Using grid search, likelihood reaches maximum at e. g. • In general, this type of experiment tests H 0: Independence between the factors (marker and gene), (sex and purchase) • and H 0: no escapes Uses Likelihood Ratio Test statistics. (M. L. E. 2 equivalent) 9

MLE Methods in outline contd. Newton-Raphson Iteration Have Score ( ) = 0 from previously. N-R consists of replacing Score by linear terms of its Taylor expansion, so if ´´ a solution, ´=1 st guess Repeat with ´´ replacing ´ Each iteration - fits a parabola to Likelihood Fn. L. F. 2 nd • Problems - Multiple peaks, zero Information, extreme estimates • Multiple parameters – need matrix notation, where S matrix e. g. has elements = derivatives of S( , ) w. r. t. and respectively. Similarly, Information matrix has terms of form Estimates are Variance of Log-L i. e. S( ) 1 st 10

MLE Methods in outline contd. Expectation-Maximisation Algorithm - Iterative. Incomplete data (Much genomic, financial and other data fit this situation e. g. linkage analysis with marker genotypes of F 2 progeny. Usually 9 categories observed for 2 locus, 2 -allele model, but 16 = complete info. , while 14 give info. on linkage. Some hidden, but if linkage parameter known, expected frequencies can be predicted and the complete data restored using expectation). • Steps: (1) Expectation estimates statistics of complete data, given observed incomplete data. • -(2) Maximisation uses estimated complete data to give MLE. • Iterate till converges (no further change) 11

E-M contd. Implementation • Initial guess, ´, chosen (e. g. =0. 25 say = R. F. ). • Taking this as “true”, complete data is estimated, by distributional statements e. g. P(individual is recombinant, given observed genotype) for R. F. estimation. • MLE estimate ´´ computed. • This, for R. F. sum of recombinants/N. • Thus MLE, for fi observed count, Convergence ´´ = ´ or 12

LIKELIHOOD : C. I. and H. T. • Likelihood Ratio Tests – c. f. with 2. • Principal Advantage of G is Power, as unknown parameters involved in hypothesis test. Have : Likelihood of taking a value A which maximises it, i. e. its MLE and likelihood under H 0 : N , (e. g. N = 0. 5) • Form of L. R. Test Statistic or, conventionally - choose; easier to interpret. • Distribution of G ~ approx. 2 (d. o. f. = difference in dimension of parameter spaces for L( A), L( N) ) • Goodness of Fit : notation as for 2 , G ~ 2 n-1 : • Independence: notation again as for 2 13

Likelihood C. I. ’s – graphical method • Example: Consider the following Likelihood function is the unknown parameter ; a, b observed counts • For 4 data sets observed, A: (a, b) = (8, 2), B: (a, b)=(16, 4) C: (a, b)=(80, 20) D: (a, b) = (400, 100) • Likelihood estimates can be plotted vs possible parameter values, with MLE = peak value. e. g. MLE = 0. 2, Lmax=0. 0067 for A, and Lmax=0. 0045 for B etc. Set A: Log Lmax- Log L=Log(0. 0067) - Log(0. 00091)= 2 gives 95% C. I. so =(0. 035, 0. 496) corresponding to L=0. 00091, 95% C. I. for A. Similarly, manipulating this expression, Likelihood value corresponding to 95% confidence interval given as L = (7. 389)-1 Lmax Note: Usually plot Log-likelihood vs parameter, rather than Likelihood. As sample size increases, C. I. narrower and symmetric 14

Maximum Likelihood Benefits • Strong estimator properties – sufficiency, efficiency, consistency, non-bias etc. as before • Good Confidence Intervals Coverage probability realised and intervals meaningful • MLE Good estimator of a CI MSE consistent Absence of Bias - does not “stand-alone” – minimum variance important Asymptotically Normal Precise – large sample Inferences valid, ranges realistic 15