Ch 1 Introduction Pattern Recognition and Machine Learning























- Slides: 23
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Summarized by K. I. Kim Biointelligence Laboratory, Seoul National University http: //bi. snu. ac. kr/ 1
Contents 1. 1 Example: Polynomial Curve Fitting l 1. 2 Probability Theory l ¨ 1. 2. 1 Probability densities ¨ 1. 2. 2 Expectations and covariance ¨ 1. 2. 3 Bayesian probabilities ¨ 1. 2. 4 The Gaussian distribution ¨ 1. 2. 5 Curve fitting re-visited ¨ 1. 2. 6 Bayesian curve fitting l 1. 3 Model Selection (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac. kr/ 2
Pattern Recognition Training set, l Target vector, l Training (learning) phase l ¨ Determine l Generalization ¨ Test set l Preprocessing ¨ Feature selection (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac. kr/ 3
Supervised, Unsupervised and Reinforcement Learning l Supervised Learning: with target vector ¨ Classification ¨ Regression l Unsupervised learning: w/o target vector ¨ Clustering ¨ Density estimation ¨ Visualization l Reinforcement learning: maximize a reward ¨ Trade-off between exploration & exploitation (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac. kr/ 4
Example: Polynomial Curve Fitting l N observations l l Minimizing error function (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac. kr/ 5
Model Selection & Over-fitting (1/2) (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac. kr/ 6
Model Selection & Over-fitting (2/2) l RMS(Root-Mean-Square) Error Too large → Over-fitting l The more data, the better generalization l Over-fitting is a general property of maximum likelihood l (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac. kr/ 7
Regularization l ¨ Shrinkage ¨ Ridge regression ¨ Weight decay (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac. kr/ 8
Probability Theory l “What is the overall probability that the selection procedure will pick an apple? ” l “Given that we have chosen an orange, what is the probability that the box we chose was the blue one? ” (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac. kr/ 9
Rules of Probability (1/2) l Joint probability l Marginal probability l Conditional probability (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac. kr/ 10
Rules of Probability (2/2) l Sum rule l Production rule l Bayes’ theorem Likelihood Prior Posterior Normalizing constant (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac. kr/ 11
Probability densities (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac. kr/ 12
Expectations and Covariances l Expectation l Variance l Covariance (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac. kr/ 13
Bayesian Probabilities -Frequantist vs. Bayesian l l Likelihood: Frequantist ¨ w: a fixed parameter determined by 'estimator‘ < Maximum likelihood: Error function = < Error bars: Obtained by the distribution of possible data sets – Bootstrap l Bayesian ¨ a single data set ¨ a probability distribution w: the uncertainty in the parameters ¨ Prior knowledge < noninformative prior (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac. kr/ 14
Bayesian Probabilities -Expansion of Bayesian Application l Limited application of full Bayesian procedure ¨ from 18 th century ¨ Marginalize over the whole of parameter space l Markov chain Monte Carlo ¨ Small-scale problem l Highly efficient deterministic approximation schemes ¨ e. g. variational Bayes, expectation propagation ¨ Large-scale problem (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac. kr/ 15
Gaussian distribution l l D-demensional Multivariate Gaussian Distribution (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac. kr/ 16
Gaussian distribution -Example (1/2) Getting unknown parameters l Data points are i. i. d. l ¨ Maximizing with respect to < sample mean: ¨ Maximizing with respect to variance < sample variance: (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac. kr/ 17
Gaussian distribution -Example (2/2) l Bias phenomenon ¨ Limitation of the maximum likelihood approach (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac. kr/ 18
Curve Fitting Re-visited (1/2) l Goal in the curve fitting problem ¨ Prediction for the target variable t given some new input variable x l Determine the unknown w & by maximum likelihood l (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac. kr/ 19
Curve Fitting Re-visited (2/2) l ¨ maximizing likelihood = minimizing the sum-of-squares error function l l Predictive distribution (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac. kr/ 20
Maximum Posterior (MAP) l Add prior probability ¨ : hyperparameter ¨ Minimum of equals (1. 4) ¨ (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac. kr/ 21
Bayesian Curve Fitting l Marginalization (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac. kr/ 22
Model Selection Proper model complexity → Good generalization & best model l Measuring the generalization performance l ¨ If data are plentiful, divide into training, validation & test set ¨ Otherwise, cross-validate < Leave-one-out technique < Drawbacks – Expensive computation – Using separate data → multiple complexity parameters ¨ New measures of performance < e. g. Akaike information criterion(AIC), Bayesian information criterion(BIC) (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac. kr/ 23