Classification Problem Given Predict class label of a
Classification Problem • Given - - -+ ++ + + -- - + ++ + -- + + + - - -. ++ + + - - - ++ ++ - • Predict class label of a given query
Classification Problem • Unknown probability distribution • We need to estimate:
The Bayesian Classifier • Loss function: • Expected loss (conditional risk) associated with class j: • Bayes rule: • Zero-one loss function: Bayes rule
The Bayesian Classifier • Bayes rule achieves the minimum error rate • How to estimate the posterior probabilities:
Density estimation • Use Bayes theorem to estimate the posterior probability values: is the probability density function of class is the prior probability of class given
Naïve Bayes Classifier • Makes the assumption of independence of features given the class: • The task of estimating a q-dimensional density function is reduced to the estimation of q one-dimensional density functions. Thus, the complexity of the task is drastically reduced. • The use of Bayes theorem becomes much simpler. • Proven to be effective in practice.
Nearest-Neighbor Methods • Predict the class label of as the most frequent one occurring in the neighbors - -+ ++ + + -- - + ++ + + -- + + + + + - -+ + - - - ++ ++ - .
Nearest-Neighbor Methods • Predict the class label of as the most frequent one occurring in the neighbors - -+ ++ + + -- - + ++ + + -- + + + + + - -- + + + - - - ++ ++ -
Nearest-Neighbor Methods • Predict the class label of as the most frequent one occurring in the neighbors - -+ ++ + + -- - + ++ + + -- + + + + + - -+ + - - - ++ ++ - . . Basic assumption: . e c n a t s di metric
Example: Letter Recognition First statistical moment . . . Edge count
Asymptotic Properties of K-NN Methods if and • The first condition reduces the variance by making the estimation independent of the accidental characteristics of the K nearest neighbors. • The secondition reduces the bias by assuring that the K nearest neighbors are arbitrarily close to the query point.
Asymptotic Properties of K-NN Methods classification error rate of the 1 -NN rule classification error rate of the Bayes rule In the asymptotic limit no decision rule is more than twice as accurate as the 1 -NN rule
Finite-sample settings • How well the 1 -NN rule works in finitesample settings? • If the number of training data N is large and the number of input features q is small, then the asymptotic results may still be valid. • However, for a moderate to large number of input variables, the sample required for their validity is beyond feasibility.
Curse-of-Dimensionality • This phenomenon is known as the curse-of-dimensionality • It refers to the fact that in high dimensional spaces data become extremely sparse and are far apart from each other • It affects any estimation problem with high dimensionality
Curse of Dimensionality DMAX/DMIN Sample of size N=500 uniformly distributed in
Curse of Dimensionality dim The distribution of the ratio DMAX/DMIN converges to 1 as the dimensionality increases
Curse of Dimensionality dim Variance of distances from a given point
Curse of Dimensionality dim The variance of distances from a given point converges to 0 as the dimensionality increases
Curse of Dimensionality Distance values from a given point Values flatten out as dimensionality increases
Computing radii of nearest neighborhoods
median radius of a nearest neighborhood
Curse-of-Dimensionality As dimensionality increases, the distance from the closest point increases faster • Random sample of size uniform distribution in the -dimensional unit hypercube • Diameter of a distance: Large neighborhood using Euclidean Highly biased estimations
Curse-of-Dimensionality • It is a serious problem in many real-world applications • Microarray data: 3, 000 -4, 000 genes; • Documents: 10, 000 -20, 000 words in dictionary; • Images, face recognition, etc.
How can we deal with the curse of dimensionality?
variance covariance
Dimensionality Reduction • Many dimensions are often interdependent (correlated); We can: • Reduce the dimensionality of problems; • Transform interdependent coordinates into significant and independent ones;
- Slides: 29