Linear discrete inverse Problems parameter estimation Least squares
Linear discrete inverse Problems (parameter estimation) Least squares and all that. .
Least squares problems Least squares is the basis of many parameter estimation and data fitting procedures. A concise tutorial can be found in Chapter 15 of the book Numerical Recipes Press et al. (1992) Cambridge Univ. Press. Available for free online at http: //www. nrbook. com Good explanation of essentials in Aster et al. (2005). 2
Linear discrete inverse problems Can a and b be resolved ? Under-determined Over-determined Even-determined 3
Over-determined: Linear discrete inverse problem To find the best fit model we can minimize the prediction error of the solution But the data contain errors. Let’s assume these are independent and normally distributed, then we weight each residual inversely by the standard deviation of the corresponding (known) error distribution. We can obtain a least squares solution by minimizing the weighted prediction error of the solution. 4
Over-determined: Linear discrete inverse problem We seek the model vector m which minimizes Compare with maximum likelihood Note that this is a quadratic function of the model vector. Solution: Differentiate with respect to m and solve for the model vector which gives a zero gradient in This gives… This is the least-squares solution. A solution to the normal equations: 3 5
Over-determined: Linear discrete inverse problem How does the Least-squares solution compare to the standard equations of linear regression ? Given N data yi with independent normally distributed errors and standard deviations i what are the expressions for the model parameters m = [a, b]T ? 6
Linear discrete inverse problem: Least squares What happens in the under and even-determined cases ? Under-determined, N=1: Matrix has a zero determinant What m ? eigenvalue and aiszero an infinite number of solutions exist Even-determined, N=2: What is the prediction error ? Prediction error is zero ! 7
Example: Over-determined, Linear discrete inverse problem The Ballistics example Given data and noise Calculate G Is the data fit good enough ? And how to errors in data propagate into the solution ? 8
The two questions in parameter estimation We have our fitted model parameters …but we are far from finished ! We need to: Assess the quality of the data fit. Goodness of fit: Does the model fit the data to within the statistical uncertainty of the noise ? Estimate how errors in the data propagate into the model What are the errors on the model parameters ? 9
Goodness of fit Once we have our least squares solution m. LS how do we know whether the fit is good enough given the errors in the data ? Use the prediction error at the least squares solution ! If data errors are Gaussian this as a chi-square statistic 10
Why do we always assume errors are Gaussian ? Probability density functions of 5 random variables 100000 deviates Xi The Central Limit Theorem 0 2. 5 0 0. 5 X 3 5. 0 1. 0 0 0. 5 X 1 1. 0 0 0. 5 X 2 1. 0 0 0. 5 X 4 1. 0 0 0. 5 X 5 1. 0 11
Mathematical Background: Probability distributions A random variable x follows a Normal or Gaussian probability density function if Multivariate case If data covariance matrix is diagonal then data errors are independent. 12
Mathematical Background: Probability distributions Expectation operator Expectation of a Gaussian random variable Variance Covariance If X and Y are independent then 13
Mathematical Background: Probability distributions Multi-dimensional Gaussian Expectation of a Gaussian random vector Covariance matrix Correlation matrix Independence between X and Y Positive or negative correlation 14
Background: Chi-square distribution If x follows a Normal distribution What distribution does the square of x follow ? Answer: A chi-square distribution with 1 degree of freedom If x 1 and x 2 are Normal, what distribution does y=x 21 + x 22 follow ? Answer: A chi-square distribution with 2 degrees of freedom 15
Goodness of fit For Gaussian data errors the data prediction error is the square of a Gaussian random variable hence it has a chi-square probability density function with N-M degrees of freedom. p = 0. 95 p = 0. 05 The test provides a means to testing the assumptions that went into producing the least squares solution. It gives the likelihood that the fit actually achieved is reasonable. 16
Example: Goodness of fit The Ballistics problem Given data and noise How many degrees of freedom ? =N-M = 10 -3=7 In practice values between 0. 1 and 0. 9 are plausible 4 17
Variance ratio Another way of looking at the chi-square statistic is as the variance ratio of two distributions. Given two sets of random samples Question: Did they come from the same distribution or not ? The ratio of the variances of the two distributions follows a chi-square distribution. Chi-square tables tell us the likelihood that the two sets of observables come from the same distribution. 18
Goodness of fit For Gaussian data errors the chi-square statistic has a chisquare distribution with u = N-M degrees of freedom. Exercise: If I fit 7 data points with a straight line and get what would you conclude ? If I fit 102 data points with a straight line and get what would you conclude ? If I fit 52 data points with a straight line and get what would you conclude ? 19
Goodness of fit For Gaussian data errors the chi-square statistic has a chisquare distribution with u = N-M degrees of freedom. What could be the cause if: the prediction error is much too large ? (poor data fit) Errors in forward theory Truly unlikely data errors Under-estimated data errors the prediction error is too small ? (too good data fit) Truly unlikely data errors Over-estimated the data errors Fraud ! 20
Solution error Once we have our least squares solution m. LS and we know that the data fit is acceptable, how do we find the likely errors in the model parameters arising from errors in the data ? The data set we actually observed is only one realization of the many that could have been observed The effect of adding noise to the data is to add noise to the solution The model noise is a linear combination of the data noise ! 21
Solution error: Model Covariance Multivariate Gaussian data error distribution How to turn this into a probability distribution for the model errors ? We know that the solution error is a linear combination of the data error The covariance of any linear combination Ad of Gaussian distributed random variables d is So we have the covariance of the model parameters 22
Solution error: Model Covariance The model covariance for a least squares problem depends on data errors and not the data itself ! G is controlled by the design of the experiment. is the least squares solution The data error distribution gives a model error distribution ! 23
Solution error: Model Covariance For the special case of independent data errors Independent data errors Correlated model errors For linear regression problem 24
Confidence intervals by projection (1 -D) The model covariance is a symmetric M x M matrix In the multi-dimensional model space the value of 2 follows a distribution with M degrees of freedom. Projecting onto the mi axis the 1 -D confidence interval becomes Where 2 follows a 21 distribution e. g. for 90% interval on m 1 Note this is 90% the confidence interval on m 1 alone. The joint (m 1, m 2) 90% confidence ellipse is wider than this. 25
Example: Model Covariance and confidence intervals For Ballistics problem 95% confidence interval for parameter i 26
Confidence intervals by projection The M-dimensional confidence ellipsoid can be projected onto any subset (or combination) of parameters to obtain the corresponding confidence ellipsoid. Full M-dimensional ellipsoid Projected dimension ellipsoid Projected model vector Projected covariance matrix Chosen percentage point of the 2 distribution To find the 90% confidence ellipse for (x, y) from a 3 -D (x, y, z) ellipsoid Can you see that this procedure gives the same formula for the 1 -D case obtained previously ? 27
Recap: Goodness of fit and model covariance Once a best fit solution has been obtained we test goodness of fit with a chi-square test (assuming Gaussian statistics) If the model passes the goodness of fit test we may proceed to evaluating model covariance (if not then your data errors are probably too small) Evaluate model covariance matrix Plot model or projections of it onto chosen subsets of parameters Calculate confidence intervals using projected equation Where 2 follows a 21 distribution 28
Robust data fitting with the L 1 norm Least squares solutions are not robust Minimize We can calculate an L 1 solution with the IRLS algorithm is a diagonal weighting matrix that depends on the model See section 2. 4 of Aster (2005) 29
Monte Carlo error propagation Its possible to define an approximate p statistic for the L 1 solution and hence test goodness of fit of the solution. However there is no analytical solution to error propagation. …but Monte Carlo error propagation can be used. 1. Calculate data prediction from solution 2. Add random realization of noise to data and repeat IRLS algorithm 3. Repeat Q times and generate difference vectors 30
Monte Carlo error propagation For the ballistics problem we get Compare to LS solution without outlier 31
What if we do not know the errors on the data ? Both Chi-square goodness of fit tests and model covariance Calculations require knowledge of the variance of the data. What can we do if we do not know ? Consider the case of Independent data errors Calculated from least squares solution So we can still estimate model errors using the calculated data errors but we can no long claim anything about goodness of fit. 32
Example: Over-determined, Linear discrete inverse problem MATLAB exercise Generate data with Gaussian noise for a linear regression problem and invert for the best fitting gradient and intercept 1. Generate xi points randomly between 0 and 10 2. Calculate data yi 3. Add N[0, ] noise to data yi 4. Calculate G matrix 5. Use MATLAB matrix routines to solve the normal equations 6. Plot the data, plot the data errors and plot the least squares solution 33
Model Resolution matrix If we obtain a solution to an inverse problem we can ask what its relationship is to the true solution But we know and hence The matrix R measures how `good an inverse’ G-g is. The matrix R shows how the elements of mest are built from linear combination of the true model, mtrue. Hence matrix R measures the amount of blurring produced by the inverse operator. For the least squares solution we have 5 34
Example: Model resolution in a tomographic experiment If the calculated model resolution matrix looked like this What units do the elements of R have ? True model Spike test Recovered model 35
Data Resolution matrix If we obtain a solution to an inverse problem we can ask what how it compares to the data But we know and hence The matrix D is analogous to the model resolution matrix R but measures how independently the model produced by G-g can reproduce the data. If D = I then the data is fit exactly and the prediction error d-Gm is zero. 36
Recap: Linear discrete inverse problems The Least squares solution minimizes the prediction error. Goodness of fit criteria tells us whether the least squares model adequately fits the data, given the level of noise. Chi-square with N-M degrees of freedom The covariance matrix describes how noise propagates from the data to the estimated model Chi-square with M degrees of freedom Gives confidence intervals The resolution matrix describes how the estimated model relates to the true model 37
- Slides: 37