For Pattern Recognition Winter School Introduction to Pattern

For 정보과학회, Pattern Recognition Winter School Introduction to Pattern Recognition 2011년 2월 김 진형

What is Pattern Recognition? A pattern is an object, process or event that can

Pattern Recognition Related fields Machine learning Mathematical statistics Neural networks Signal processing Robotics and

패턴인식의 응용 예 Computer aided diagnosis Medical imaging, EEF, ECG, X-ray mammography 영상인식 공장

패턴인식의 응용 Gesture Recognition Text editing on Pen Computers Tele-operations Control remote by gesture

패턴인식의 응용 e-Book, Tablet PC, i. Pad, Smart-phone 9

Smart Phone with Rich Sensors Voice • Speaker Eyes • Camera • Light sensor

패턴인식의 응용 KAIST Math Expression Recognizer : 12 Demo

패턴인식의 응용 15 文書認識 Verification & Correction Interface

패턴인식의 응용 18 Autonomous Land Vehicle (DARPA’s Grand. Challenge contest) http: //www. youtube. com/watch?

From Ricardo Gutierrez-Osuna, Texas A&M Univ. Types of PR problems Classification Assigning an object

Pattern Class A collection of “similar” (not necessarily identical) objects Inter-class variability Intra-class variability

Classification vs Clustering Classification (known categories) Clustering (creation of new categories) Category “A” Category

Pattern Recognition : Key Objectives Process the sensed data to eliminate noise Data vs

일반적인 Classification 과정 Sensor signal Feature Extractor Feature Classifier Class Membership 25

Example : Salmon or Sea Bass Sort incoming fish on a belt according to

Length Lightness Width Number and shape of fins Position of the mouth … Sea

Salmon vs. Sea Bass (by lightness) Best Decision Strategy with lightness 29

Cost of Misclassification There are two possible classification errors. (1) deciding a sea bass

Classification with more features (by length and lightness) It is possibly better. Really ?

How Many Features and Which? Choice of features determines success or failure of classification

Issues with feature extraction: Correlated features do not improve performance. It might be difficult

Feature and Feature Vector − − − 34 Length Lightness Width … Number and

Goodness of Features and separability 35

Developing PR system Pattern Sensors and preprocessing Teacher 36 Feature extraction Classifier Class assignment

PR Approaches Template matching The pattern to be recognized is matched against a stored

PR Approaches Template Matching Template 38 Input scene

PR Approaches Deformable Template Matching: Snake Example : Corpus Callosum Segmentation Shape training set

PR Approaches 40 From Ricardo Gutierrez-Osuna, Texas A&M Univ.

Classifier The task of classifier is to partition feature space into class-labeled decision regions

From Vojtěch Franc Representation of classifier A classifier is typically represented as a set

Classification of Classifiers by Form of Discriminant Function 43 Classifier A posteriori Probability P(

Bayesian Decision Making Statistical approach the optimal classifier with Minimum error Assume that complete

From Vojtěch Franc Limitations of Bayesian approach Statistical model p(x, y) is mostly not

From Vojtěch Franc Polynomial Discriminative Function approaches Assume that G(x) is a polynomial function

From Vojtěch Franc LDA Example : 기수(J)-농구선수(H) 분리 height Task: 기수(J)-농구선수(H) 분리 The set

Artificial Neural Network Design For a given structure, find best weight sets which minimizes

PR design cycle Data collection Probably the most time-intensive component of project How many

Learning for PR system Pattern Sensors and preprocessing Teacher Feature extraction Classifier Class assignment

Learning Change of contents and organization of system’s knowledge enabling to improve to its

Learning Algorithm Categorization Depending on Available Feedback Supervised learning examples of correct input/output pair

Issues on Learning Algorithm Prior Knowledge Prior knowledge can help in learning. Assumptions on

Learning Algorithms General Ideas Tweak parameters so as to optimize performance criterion In the

Inductive Learning For given training examples correct input-output pairs), Recover unknown underlying function from

Theory of Inductive Inference Concept C X Examples are given as (x, y) where

Consistent hypotheses William of Ockham (also Occam ) 1285 -1349 English scholastic philosopher Prefer

Model Complexity Decision Boundary of Salmon and Sea bass Which is better ? A

Model Complexity We can get perfect classification performance on the training data by choosing

Generalization The main goal of pattern classification system is to suggest the class of

Generalization Strategy How can we improve generalization performance ? More training examples (i. e.

From Vojtěch Franc Overfitting and underfitting 64 good fit overfitting Problem of generalization: a

Curse of Dimensionality Function의 수를 늘려면 error 감소 훈련데이더에 대한Classifier 성능 향상 제한된 양의

Two Slot Machine Problem Maximize outcomes from two slot machines of unknown return rates

Implication of Curse of Dimensionality to PR system design With finite training samples, be

Cross-Validation Validate learned model on different set to assess the generalization performance guarding against

From Vojtěch Franc Unsupervised learning Input: training examples {x 1, …, x } without

From Vojtěch Franc Example of unsupervised learning algorithm Goal is to minimize k-Means clustering:

Global Consistency 76 Local decision is not enough

Combining Multiple Classifiers Approaches for improving the performance of the group of experts Best

패턴 인식 성능의 평가 실제 인식결과 A not A a p s not a

Resources Professional Association International Association for Pattern Recognition (IAPR) Text Books Pattern Classification by

Slides: 81

Download presentation

For 정보과학회, Pattern Recognition Winter School Introduction to Pattern Recognition 2011년 2월 김 진형 KAIST 전산학과 http: //ai. kaist. ac. kr/~jkim

What is Pattern Recognition? A pattern is an object, process or event that can be given a name Pattern Recognition assignment of physical object or event to one of several prespecified categeries -Duda & Hart A subfield of Artificial Intelligence human intelligence is based on pattern recognition 2

Examples of Patterns 3

Pattern Recognition Related fields Machine learning Mathematical statistics Neural networks Signal processing Robotics and vision Cognitive science Nonlinear optimization Exploratory data analysis Fuzzy and genetic algorithm Detection and estimation theory Formal languages Structural modeling Biological cybernetics Computational neuroscience … 4 Application areas Image processing /segmentation Computer vision Speech recognition Automated target recognition Optical character recognition Seismic analysis Man and machine dialogue Fingerprint identification Industrial inspection Medical diagnosis ECG signal analysis Data mining Gene sequence analysis Protein structure analysis Remote sensing Aerial reconnaissance …

패턴인식의 응용 예 Computer aided diagnosis Medical imaging, EEF, ECG, X-ray mammography 영상인식 공장 자동화, Robot Navigation 얼굴식별, Gesture Recognition Automatic Target Recognition 음성인식 Speaker identification Speech recognition 5 Google Maps Navigation (Beta): search by voice

패턴인식의 응용 Gesture Recognition Text editing on Pen Computers Tele-operations Control remote by gesture input TV control by hand motion Sign language Interpretation 7

패턴인식의 응용 e-Book, Tablet PC, i. Pad, Smart-phone 9

Smart Phone with Rich Sensors Voice • Speaker Eyes • Camera • Light sensor • Proximity sensor Next Generation Haptics • tactile feedback on hand or finger Predictive texting • T 9 type combined with Gesture Recognition Skin • Touch screen • Pressure sensor Location • GPS chip • Electronic sensor Ears Others • Microphone • Accelerometer • Gyroscope • Clock • built-in phone camera for hand motion • phone moving in space with inertia sensor • with infrared technology Pico Projectors Location in 3 D plane

패턴인식의 응용 KAIST Math Expression Recognizer : 12 Demo

패턴인식의 응용 Math. Tutor-SE Demo 1 3

패턴인식의 응용 15 文書認識 Verification & Correction Interface

패턴인식의 응용 Mail Sorter 16

패턴인식의 응용 17 Scene Text Recognition

패턴인식의 응용 18 Autonomous Land Vehicle (DARPA’s Grand. Challenge contest) http: //www. youtube. com/watch? v=y. Q 5 U 8 su. TUw 0

패턴인식의 응용 Protein Structure Analysis 19

패턴인식의 응용 Protein Structure Analysis 20

From Ricardo Gutierrez-Osuna, Texas A&M Univ. Types of PR problems Classification Assigning an object to a class Output: a label of class Ex: classifying a product as ‘good’ or ‘bad’ in quality control Clustering Organizing objects into meaningful groups Output: (hierarchical) grouping of objects Ex: taxonomy of species Regression Predict value based on observation Ex: predict stock price, future prediction Description 21 Representing an object in terms of a series of primitives Output: a structural or linguistic description Ex: labeling ECG signals, video indexing, protein structure indexing

Pattern Class A collection of “similar” (not necessarily identical) objects Inter-class variability Intra-class variability Pattern Class Model descriptions of each class/population (e. g. , a probability density like Gaussian) 22

Classification vs Clustering Classification (known categories) Clustering (creation of new categories) Category “A” Category “B” Classification (Recognition) 23 (Supervised Classification) Clustering (Unsupervised Classification)

Pattern Recognition : Key Objectives Process the sensed data to eliminate noise Data vs Noise Hypothesize models that describe each class population Then we may recover the process that generated the patterns. Choose the best-fitting model for given sensed data to assign the class label associated with the model. 24

일반적인 Classification 과정 Sensor signal Feature Extractor Feature Classifier Class Membership 25

Example : Salmon or Sea Bass Sort incoming fish on a belt according to two classes: Salmon or Sea Bass Steps: Preprocessing (segmentation) Feature extraction (measure features or properties) Classification (make final decision) 26

Length Lightness Width Number and shape of fins Position of the mouth … Sea bass vs Salmon (by Image) 27

Salmon vs. Sea Bass (by length) 28

Salmon vs. Sea Bass (by lightness) Best Decision Strategy with lightness 29

Cost of Misclassification There are two possible classification errors. (1) deciding a sea bass into a salmon. (2) deciding a salmon into a sea bass. Which error is more important ? Generalized as Loss function Then, look for the decision of minimun Risk Loss Function decision Salmon truth Salmon Sea Bass 0 -10 -20 0 Risk = Expected Loss Sea bass 30

Classification with more features (by length and lightness) It is possibly better. Really ? ? 31

How Many Features and Which? Choice of features determines success or failure of classification task For a given feature, we may compute the best decision strategy from the (training) data Is called training, parameter adaptation, learning Machine Learning Issues 32

Issues with feature extraction: Correlated features do not improve performance. It might be difficult to extract certain features. It might be computationally expensive to extract many features. “Curse” of dimensionality … 33

Feature and Feature Vector − − − 34 Length Lightness Width … Number and shape of fins Position of the mouth

Goodness of Features and separability 35

Developing PR system Pattern Sensors and preprocessing Teacher 36 Feature extraction Classifier Class assignment Learning algorithm Sensors and preprocessing. A feature extraction aims to create discriminative features good for classification. A classifier. A teacher provides information about hidden state -- supervised learning. A learning algorithm sets PR from training examples.

PR Approaches Template matching The pattern to be recognized is matched against a stored template Statistical PR: based on underlying statistical model of patterns(features) and pattern classes. Structural PR: Syntactic pattern recognition pattern classes represented by means of formal structures as grammars, automata, strings, etc. Not only for classification but also description Neural networks classifier is represented as a network of cells modeling neurons of the human brain (connectionist approach). Knowledge is stored in the connectivity and strength of synaptic weights Statistical structure Analysis Combining Structure and statistical analysis Bayesian Network, MRF 등의 Probabilistic framework을 활용 37 … Modified From Vojtěch Franc

PR Approaches Template Matching Template 38 Input scene

PR Approaches Deformable Template Matching: Snake Example : Corpus Callosum Segmentation Shape training set Prototype and variation learning Prototype registration to the low-level segmented image Prototype warping 39

PR Approaches 40 From Ricardo Gutierrez-Osuna, Texas A&M Univ.

Classifier The task of classifier is to partition feature space into class-labeled decision regions Borders between decision regions decision boundaries Determining decision region of a feature vector X 41

From Vojtěch Franc Representation of classifier A classifier is typically represented as a set of discriminant functions … The classifier assigns a feature vector x to the i-the class if 42 …. Feature vector Discriminant function Class identifier

Classification of Classifiers by Form of Discriminant Function 43 Classifier A posteriori Probability P( yi | X) Bayesian Linear Function Linear Discrinant Analysis, Support Vector Machine Non-Linear Function Non-Linear Discrinant Analysis Output of artificial Neuron Artificial Neural Network

Bayesian Decision Making Statistical approach the optimal classifier with Minimum error Assume that complete statistical model is known. Decision given the posterior probabilities X is an observation : if P( 1 | x) > P( 2 | x) if P( 1 | x) < P( 2 | x) 44 decide state of nature = 1 decide state of nature = 2

Searching Decision Boundary 45

Bayesian Rule : P(x| 1) P( 1|x) 46

From Vojtěch Franc Limitations of Bayesian approach Statistical model p(x, y) is mostly not known learning to estimate p(x, y) from training examples {(x 1, y 1), …, (x , y )} Usually p(x, y) is assumed to be a parametric form Ex: multivariate normal distribution Non-parametric estimation of p(x, y) requires a large set of training samples Non-Bayesian methods offers equally good (? ? ) 47

From Vojtěch Franc Polynomial Discriminative Function approaches Assume that G(x) is a polynomial function Linear function – Linear Discriminant Analysis (LDA) Quadratic function Classifier design is determination of separating hyperplane. 48

From Vojtěch Franc LDA Example : 기수(J)-농구선수(H) 분리 height Task: 기수(J)-농구선수(H) 분리 The set of hidden state is The feature space is weight Linear classifier: 49 Training examples …

Artificial Neural Network Design For a given structure, find best weight sets which minimizes sum of square error, J(w), from training examples {(x 1, y 1), …, (x , y )} 50

PR design cycle Data collection Probably the most time-intensive component of project How many examples are enough ? Feature choice Critical to the success of the PR project Require basic prior knowledge, engineering sense Model choice and design Statistical, neural and structural Parameter settings Training Given a feature set and ‘blank’ model, adapt the model to explain the training data Supervised, unsupervised, reinforcement learning Evaluation How well does the trained model do ? Overfitting vs. generalization 51

Learning for PR system Pattern Sensors and preprocessing Teacher Feature extraction Classifier Class assignment Learning algorithm Which Feature is good for classifying given classes ? Feature analysis Can we get required probabilities or boundaries ? 52 Learning from training Data

Learning Change of contents and organization of system’s knowledge enabling to improve to its performance on task - Simon When it acquire new knowledge from environment Learning from Observation from trivial memorization to the creation of scientific theories Inductive Inference New consistent interpretation of data (observations) General conclusion from examples Infer association between input and output with some confidence Data Mining Learning rules from large set of data Availability of large database allows application of machine learning to real problems 53

Learning Algorithm Categorization Depending on Available Feedback Supervised learning examples of correct input/output pair is available Induction Unsupervised learning No hint at all about the correct outputs. Clustering or consistent interpretation. Reinforcement learning Receives no examples, but rewards or punishments at the end Semi-supervised learning Training with labeled training examples and unlabeled examples 54

Issues on Learning Algorithm Prior Knowledge Prior knowledge can help in learning. Assumptions on parametric forms and range of values Incremental learning Update old knowledge whenever new example arrives Batch learning Apply learning algorithm to the entire set of examples Analytic approach : find the optimal parameter values by analysis Iterative adaptation : improve parameter values from initial guess 55

Learning Algorithms General Ideas Tweak parameters so as to optimize performance criterion In the course of learning, the parameter vector traces a path that (hopefully) ends at the best parameter vector 56

Inductive Learning For given training examples correct input-output pairs), Recover unknown underlying function from which the training data generated Generalization ability for unseen data is required Forms of the Function Logical sentences / Polynomials / Set of weights (Neural Networks), … Given form of function, adjust parameters to minimize error 57

Theory of Inductive Inference Concept C X Examples are given as (x, y) where x X and y = 1 if x C, y = 0 if x C Find F such that F(x)= 1 if x C, and F(x)= 0 if x C Inductive bias constraints on hypothesis space Table of all observation is not a choice Restricted Hypothesis space biases Preference biases Occam’s razor (Ockham) : simple hypo is best 58

Consistent hypotheses William of Ockham (also Occam ) 1285 -1349 English scholastic philosopher Prefer the simplest hypothesis consistent with data Definition of ‘simple’ is not easy Tradeoff between complexity of hypothesis and degree of fit 59

Model Complexity Decision Boundary of Salmon and Sea bass Which is better ? A or B A 60 B

Model Complexity We can get perfect classification performance on the training data by choosing complex models. Issue of generalization 61

Generalization The main goal of pattern classification system is to suggest the class of objects yet unseen : Generalization Some complex decision boundaries are not good at generalization. Some simple boundaries are not good either. Tradeoff between performance and simplicity core of statistical pattern recognition 62

Generalization Strategy How can we improve generalization performance ? More training examples (i. e. , better pdf estimates). Simpler models (i. e. , simpler classification boundaries) usually yield better performance. 63 Simplify the decision boundary!

From Vojtěch Franc Overfitting and underfitting 64 good fit overfitting Problem of generalization: a small emprical risk Remp does not imply small true expected risk R.

Curse of Dimensionality Function의 수를 늘려면 error 감소 훈련데이더에 대한Classifier 성능 향상 제한된 양의 training Data로 훈련 시에 Feature 수를 늘리면 일반화 능력 감소 적절한 일반화 능력 향상을 위하여 요구되는 훈련데이 터의 양은 feature dimension에 따라 급격히 증가 65 For a finite set of training data, Finding Optimal set of Features is a difficult problem

Two Slot Machine Problem Maximize outcomes from two slot machines of unknown return rates 66 How much coins should be spent to find the better machine ?

Optimal Number of Cells (example) 67

Implication of Curse of Dimensionality to PR system design With finite training samples, be cautious of adding features Features of high Discrimination power first Feature analysis is mandatory Simple neural networks is generally better small number of hidden nodes, links Tips for structure simplification Parameter tying Eliminate links during learning 68

Cross-Validation Validate learned model on different set to assess the generalization performance guarding against overfitting Partition Training set into Estimation subset for learning parameters validation subset cross-validation for best model selection determine when to stop training Leave-one-out validation method 69 N-1 for training, 1 for validation, takes turn Overcome Small training set

From Vojtěch Franc Unsupervised learning Input: training examples {x 1, …, x } without information about the hidden state. Clustering: goal is to find clusters of data sharing similar properties. A broad class of unsupervised learning algorithms: … … Classifier Learning algorithm 70 Classifier

From Vojtěch Franc Example of unsupervised learning algorithm Goal is to minimize k-Means clustering: … Classifier Learning algorithm 71 …

Other Issues in Pattern Recognition

Difficulty of Class Modeling 73

인식에는 Context Processing이 필수 74

Global Consistency 76 Local decision is not enough

Combining Multiple Classifiers Approaches for improving the performance of the group of experts Best single classifier vs Combining multiple classifiers Two heads (experts, classifiers) are better than one Classifier output is either Best (single) class Ranking Score as each class Method for generating multiple classifiers Co-related classifiers would not be a help Method for combining multiple classifiers Majority rules, borda count, decorelated combination, etc. 77

패턴 인식 성능의 평가 실제 인식결과 A not A a p s not a r q (정) 인식률 = (p+q)/(p+q+r+s) 오인식률 = (r+s)/(p+q+r+s) Miss detection = r/(p+r) False alarm = s/(p+s) Recall = p/p+r Precision = p/p+s 기각율 (refuse to make decision) 처리율 A case: 20% 기각했는데 결과에는 0. 5% error B case : 10% 기각했는데 결과에는 1. 0% error Which is better ? 78

Resources Professional Association International Association for Pattern Recognition (IAPR) Text Books Pattern Classification by Richard O. Duda, Pater E. Hart, and David G, Storks Journals IEEE Transactions on Pattern Analysis and Machine Intelligence Pattern Recognition Letters Artificial Intelligence and Pattern Recognition … Conference and Workshops International Conference for Pattern Recognition Int’l Conference on Document Analysis and Recognition Int’l Workshop in Frontiers of Handwriting Recognition IEEE Computer Vision and Pattern Recognition … 81