Ch 1 Introduction Pattern Recognition and Machine Learning

  • Slides: 23
Download presentation
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Summarized by

Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Summarized by K. I. Kim Biointelligence Laboratory, Seoul National University http: //bi. snu. ac. kr/ 1

Contents 1. 1 Example: Polynomial Curve Fitting l 1. 2 Probability Theory l ¨

Contents 1. 1 Example: Polynomial Curve Fitting l 1. 2 Probability Theory l ¨ 1. 2. 1 Probability densities ¨ 1. 2. 2 Expectations and covariance ¨ 1. 2. 3 Bayesian probabilities ¨ 1. 2. 4 The Gaussian distribution ¨ 1. 2. 5 Curve fitting re-visited ¨ 1. 2. 6 Bayesian curve fitting l 1. 3 Model Selection (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac. kr/ 2

Pattern Recognition Training set, l Target vector, l Training (learning) phase l ¨ Determine

Pattern Recognition Training set, l Target vector, l Training (learning) phase l ¨ Determine l Generalization ¨ Test set l Preprocessing ¨ Feature selection (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac. kr/ 3

Supervised, Unsupervised and Reinforcement Learning l Supervised Learning: with target vector ¨ Classification ¨

Supervised, Unsupervised and Reinforcement Learning l Supervised Learning: with target vector ¨ Classification ¨ Regression l Unsupervised learning: w/o target vector ¨ Clustering ¨ Density estimation ¨ Visualization l Reinforcement learning: maximize a reward ¨ Trade-off between exploration & exploitation (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac. kr/ 4

Example: Polynomial Curve Fitting l N observations l l Minimizing error function (C) 2006,

Example: Polynomial Curve Fitting l N observations l l Minimizing error function (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac. kr/ 5

Model Selection & Over-fitting (1/2) (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac.

Model Selection & Over-fitting (1/2) (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac. kr/ 6

Model Selection & Over-fitting (2/2) l RMS(Root-Mean-Square) Error Too large → Over-fitting l The

Model Selection & Over-fitting (2/2) l RMS(Root-Mean-Square) Error Too large → Over-fitting l The more data, the better generalization l Over-fitting is a general property of maximum likelihood l (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac. kr/ 7

Regularization l ¨ Shrinkage ¨ Ridge regression ¨ Weight decay (C) 2006, SNU Biointelligence

Regularization l ¨ Shrinkage ¨ Ridge regression ¨ Weight decay (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac. kr/ 8

Probability Theory l “What is the overall probability that the selection procedure will pick

Probability Theory l “What is the overall probability that the selection procedure will pick an apple? ” l “Given that we have chosen an orange, what is the probability that the box we chose was the blue one? ” (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac. kr/ 9

Rules of Probability (1/2) l Joint probability l Marginal probability l Conditional probability (C)

Rules of Probability (1/2) l Joint probability l Marginal probability l Conditional probability (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac. kr/ 10

Rules of Probability (2/2) l Sum rule l Production rule l Bayes’ theorem Likelihood

Rules of Probability (2/2) l Sum rule l Production rule l Bayes’ theorem Likelihood Prior Posterior Normalizing constant (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac. kr/ 11

Probability densities (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac. kr/ 12

Probability densities (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac. kr/ 12

Expectations and Covariances l Expectation l Variance l Covariance (C) 2006, SNU Biointelligence Lab,

Expectations and Covariances l Expectation l Variance l Covariance (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac. kr/ 13

Bayesian Probabilities -Frequantist vs. Bayesian l l Likelihood: Frequantist ¨ w: a fixed parameter

Bayesian Probabilities -Frequantist vs. Bayesian l l Likelihood: Frequantist ¨ w: a fixed parameter determined by 'estimator‘ < Maximum likelihood: Error function = < Error bars: Obtained by the distribution of possible data sets – Bootstrap l Bayesian ¨ a single data set ¨ a probability distribution w: the uncertainty in the parameters ¨ Prior knowledge < noninformative prior (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac. kr/ 14

Bayesian Probabilities -Expansion of Bayesian Application l Limited application of full Bayesian procedure ¨

Bayesian Probabilities -Expansion of Bayesian Application l Limited application of full Bayesian procedure ¨ from 18 th century ¨ Marginalize over the whole of parameter space l Markov chain Monte Carlo ¨ Small-scale problem l Highly efficient deterministic approximation schemes ¨ e. g. variational Bayes, expectation propagation ¨ Large-scale problem (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac. kr/ 15

Gaussian distribution l l D-demensional Multivariate Gaussian Distribution (C) 2006, SNU Biointelligence Lab, http:

Gaussian distribution l l D-demensional Multivariate Gaussian Distribution (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac. kr/ 16

Gaussian distribution -Example (1/2) Getting unknown parameters l Data points are i. i. d.

Gaussian distribution -Example (1/2) Getting unknown parameters l Data points are i. i. d. l ¨ Maximizing with respect to < sample mean: ¨ Maximizing with respect to variance < sample variance: (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac. kr/ 17

Gaussian distribution -Example (2/2) l Bias phenomenon ¨ Limitation of the maximum likelihood approach

Gaussian distribution -Example (2/2) l Bias phenomenon ¨ Limitation of the maximum likelihood approach (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac. kr/ 18

Curve Fitting Re-visited (1/2) l Goal in the curve fitting problem ¨ Prediction for

Curve Fitting Re-visited (1/2) l Goal in the curve fitting problem ¨ Prediction for the target variable t given some new input variable x l Determine the unknown w & by maximum likelihood l (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac. kr/ 19

Curve Fitting Re-visited (2/2) l ¨ maximizing likelihood = minimizing the sum-of-squares error function

Curve Fitting Re-visited (2/2) l ¨ maximizing likelihood = minimizing the sum-of-squares error function l l Predictive distribution (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac. kr/ 20

Maximum Posterior (MAP) l Add prior probability ¨ : hyperparameter ¨ Minimum of equals

Maximum Posterior (MAP) l Add prior probability ¨ : hyperparameter ¨ Minimum of equals (1. 4) ¨ (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac. kr/ 21

Bayesian Curve Fitting l Marginalization (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac.

Bayesian Curve Fitting l Marginalization (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac. kr/ 22

Model Selection Proper model complexity → Good generalization & best model l Measuring the

Model Selection Proper model complexity → Good generalization & best model l Measuring the generalization performance l ¨ If data are plentiful, divide into training, validation & test set ¨ Otherwise, cross-validate < Leave-one-out technique < Drawbacks – Expensive computation – Using separate data → multiple complexity parameters ¨ New measures of performance < e. g. Akaike information criterion(AIC), Bayesian information criterion(BIC) (C) 2006, SNU Biointelligence Lab, http: //bi. snu. ac. kr/ 23