Generalizing Linear Discriminant Analysis Linear Discriminant Analysis Objective
Generalizing Linear Discriminant Analysis
Linear Discriminant Analysis Objective -Project a feature space (a dataset n-dimensional samples) onto a smaller -Maintain the class separation Reason -Reduce computational costs -Minimize overfitting
Linear Discriminant Analysis Want to reduce dimensionality while preserving ability to discriminate Figures from [1]
Linear Discriminant Analysis Could just look at means and find dimension that separates means most: Equation from [1]
Linear Discriminant Analysis Could just look at means and find dimension that separates means most: Equations from [1]
Linear Discriminant Analysis Figure from [1]
Linear Discriminant Analysis Fisher’s solution.
Linear Discriminant Analysis Fisher’s solution… Scatter: Equation from [1]
Linear Discriminant Analysis Fisher’s solution… Scatter: Maximize: Equations from [1]
Linear Discriminant Analysis Fisher’s solution… Figure from [1]
Linear Discriminant Analysis How to get optimum w*?
Linear Discriminant Analysis How to get optimum w*? ◦ Must express J(w) as a function of w. Equation from [1]
Linear Discriminant Analysis How to get optimum w*8… Equation from [1]
Linear Discriminant Analysis How to get optimum w*… Equations modified from [1]
Linear Discriminant Analysis How to get optimum w*… Equation from [1]
Linear Discriminant Analysis How to get optimum w*… Equation from [1]
Linear Discriminant Analysis How to get optimum w*… Equations from [1]
Linear Discriminant Analysis How to generalize for >2 classes: -Instead of a single projection, we calculate a matrix of projections.
Linear Discriminant Analysis How to generalize for >2 classes: -Instead of a single projection, we calculate a matrix of projections. -Within-class scatter becomes: -Between-class scatter becomes: Equations from [1]
Linear Discriminant Analysis How to generalize for >2 classes… Here, W is a projection matrix. Equation from [1]
Linear Discriminant Analysis Limitations of LDA: -Parametric method -Produces at most (C-1) projections Benefits of LDA: -Linear Decision Boundaries ◦ Human interpretation ◦ Implementation -Good classification results
Flexible Discriminant Analysis
Flexible Discriminant Analysis -Turns the LDA problem into a linear regression problem.
Flexible Discriminant Analysis -Turns the LDA problem into a linear regression problem. -“Differences between LDA and FDA and what criteria can be used to pick one for a given task? ” (Tavish)
Flexible Discriminant Analysis -Turns the LDA problem into a linear regression problem. -“Differences between LDA and FDA and what criteria can be used to pick one for a given task? ” (Tavish) ◦ Linear regression can be generalized into more flexible, nonparametric forms of regression. ◦ (Parametric – mean, variance…)
Flexible Discriminant Analysis -Turns the LDA problem into a linear regression problem. -“Differences between LDA and FDA and what criteria can be used to pick one for a given task? ” (Tavish) ◦ Linear regression can be generalized into more flexible, nonparametric forms of regression. ◦ (Parametric – mean, variance…) ◦ Expands the set of predictors via basis expansions
Flexible Discriminant Analysis Figure from [2]
Penalized Discriminant Analysis
Penalized Discriminant Analysis -Fit an LDA model, but ‘penalize’ the coefficients to be more smooth. ◦ Directly curbing ‘overfitting’ problem
Penalized Discriminant Analysis -Fit an LDA model, but ‘penalize’ the coefficients to be more smooth. ◦ Directly curbing ‘overfitting’ problem Positively correlated predictors lead to noisy, negatively correlated coefficient estimates, and this noise results in unwanted sampling variance. ◦ Example: images
Penalized Discriminant Analysis Images from [2]
Mixture Discriminant Analysis
Mixture Discriminant Analysis -Instead of enlarging (FDA) the set of predictors, or smoothing the coefficients (PDA) for the predictors, and using one Gaussian:
Mixture Discriminant Analysis -Instead of enlarging (FDA) the set of predictors, or smoothing the coefficients (PDA) for the predictors, and using one Gaussian: -Model each class as a mixture of two or more Gaussian components. -All components sharing the same covariance matrix
Mixture Discriminant Analysis Image from [2]
Sources 1. Gutierrez-Osuna, Ricardo– “CSCE 666 Pattern Analysis – Lecture 10” http: //research. cs. tamu. edu/prism/lectures/pr/pr_l 10. pdf 2. Hastie , Trever, et al. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 3. Raschka, Sebastian - “Linear Discriminant Analysis bit by bit” http: //sebastianraschka. com/Articles/2014_python_lda. html
END.
- Slides: 37