Parameter Estimation Bayesian Estimation Chapter 3 Duda et
Parameter Estimation: Bayesian Estimation Chapter 3 (Duda et al. ) – Sections 3. 3 -3. 7 CS 479/679 Pattern Recognition Dr. George Bebis
Bayesian Estimation (BE) p(D/θ) • �� is assumed to be a random variable with a prior density p(�� ). • Using the training data, we will turn this to a posterior density p(�� |D). p(θ/D) • Ideally, the data will sharpen the posterior p(�� |D), that is, reduce our uncertainty about the parameters.
Role of Training Examples in Classification • The Bayes’ rule allows us to compute the posterior probabilities P(ωi /x): • Consider the role of the training examples D by introducing them in the computation of the posterior probabilities:
Role of Training Examples (cont’d) marginalize Using only the samples from class i/j
Role of Training Examples (cont’d) • The training examples are important in determining both the class-conditional densities and the prior probabilities: • For simplicity, replace P(ωi /Di) with P(ωi):
Role of Training Examples (cont’d) • Need to estimate p(x/ωi, Di) for every class ωi • If the samples in Dj give no information about qi, we need to solve c independent problems: “Given D, estimate p(x/D)”
BE Approach • Estimate p(x/D) as follows: marginalize • Since , we have: BE Solution assumed model (i. e. , Gaussian)
BE vs ML/MAP • ML/MAP makes a point estimate • BE estimates a distribution: • Note: BE solution might not be of the exact parametric form assumed.
Interpretation of BE Solution • If we are less certain about the exact value of θ, consider a weighted average of p(x / θ) over the possible values of θ: • Training data D exert their influence on p(x/D) through p(θ/D).
Relationship to ML solution • If p(D/ θ) peaks sharply at (i. e. , ML solution) then p(θ /D) will, in general, peak sharply at too (assuming p(θ) is broad and smooth) p(θ /D) • Therefore, ML is a special case of BE! p(θ /D)
BE Main Steps (1) Compute p(θ/D) : (2) Compute p(x/D) :
Case 1: Univariate Gaussian, Unknown μ (known (Step 1) D={x 1, x 2, …, xn} (independently drawn) )
Case 1: Univariate Gaussian, Unknown μ (cont’d) • It can be shown that p(μ/D) has the following form: X • So, p(μ/D) peaks at μn c
Case 1: Univariate Gaussian, Unknown μ (cont’d) (i. e. , lies between them) as 0 as (ML estimate) The uncertainty of our estimate gets smaller as n increases!
Example: Bayesian Learning Case 1: Univariate Gaussian, Unknown μ (cont’d)
Case 1: Univariate Gaussian, Unknown μ (cont’d) (Step 2) independent on μ Note: the assumed form is p(x/μ)~N(μ, σ2); but p(x/D)~N(μn, σ2+ σ2 n);
Case 2: Multivariate Gaussian, Unknown μ Assume p(x/μ)~N(μ, Σ) and p(μ)~N(μ 0, Σ 0) (known μ 0, Σ 0) D={x 1, x 2, …, xn} (Step 1) Compute p(μ/D): (independently drawn)
Case 2: Multivariate Gaussian, Unknown μ (cont’d) • It can be shown that p(μ/D) has the following form: where:
Case 2: Multivariate Gaussian, Unknown μ (cont’d) (2) Compute p(x/D): Note: the assumed form is p(x/μ)~N(μ, Σ); however, p(x/D)~N(μn, Σ+Σn);
Recursive Bayes Learning • Compute incrementally Dn: (x 1, x 2, …. , xn-1, xn) Dn-1 since
Recursive Bayes Learning (cont’d) substitute marginalize n=1, 2, … n= 0
Example p(θ)
Example (cont’d) (x 4=8) In general:
Example (cont’d) p(θ/D 4) peaks at p(θ)= p(θ/D 0) Iterations ML estimate: Bayesian estimate:
ML vs Bayesian Estimation • Number of training data – The two methods are equivalent assuming infinite number of training data (and prior distributions that do not exclude the true solution). – For small training data sets, they give different results in most cases. • Computational complexity – ML uses differential calculus or gradient search for maximizing the likelihood. – Bayesian estimation requires complex multidimensional integration techniques.
ML vs Bayesian Estimation (cont’d) • Solution interpretation – Easier to interpret ML solutions (i. e. , must be of the assumed parametric form). – A Bayesian estimation solution might not be of the parametric form assumed.
Computational Complexity ML estimation dimensionality: d • Learning complexity (off-line) # training data: n # classes: c O(dn) O(d 2 n) Pre-compute certain terms to save time during classification: O(d 3) O(1) O(d 2) O(n) The above computations must be repeated c times (once for each class) )
Computational Complexity dimensionality: d • Classification complexity (on-line) O(d 2) # training data: n # classes: c O(1) These computations must be repeated c times and take max
Computational Complexity Bayesian Estimation • Learning complexity: higher than ML • Classification complexity: same as ML
Summary: Main Sources of Classification Errors • Bayes error – The error due to overlapping densities p(x/ωi) • Model error – The error due to choosing an incorrect model. • Estimation error – The error due to incorrectly estimated parameters
Next Quiz • When: Tuesday, March 10 th • What: ML & Bayesian Estimation
- Slides: 31