Introduction to Pattern Recognition and Machine Learning Charles

  • Slides: 34
Download presentation
Introduction to Pattern Recognition and Machine Learning Charles Tappert LTC Avery Leider Seidenberg School

Introduction to Pattern Recognition and Machine Learning Charles Tappert LTC Avery Leider Seidenberg School of CSIS, Pace University

Pattern Classification Most of the material in these slides was taken from the figures

Pattern Classification Most of the material in these slides was taken from the figures in Pattern Classification (2 nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2001

What is pattern recognition? n n Definition from Duda, et al. – the act

What is pattern recognition? n n Definition from Duda, et al. – the act of taking in raw data and taking an action based on the “category” of the pattern We gain an understanding and appreciation for pattern recognition in the real world – visual scenes, noises, etc. n n Human senses: sight, hearing, taste, smell, touch Recognition not an exact match like a password

An Introductory Example n “Sorting incoming Fish on a conveyor according to species using

An Introductory Example n “Sorting incoming Fish on a conveyor according to species using optical sensing” Sea bass Species Salmon

Problem Analysis n Set up a camera and take some sample images to extract

Problem Analysis n Set up a camera and take some sample images to extract features n n n Length Lightness Width Number and shape of fins Position of the mouth, etc…

Pattern Classification System n Preprocessing n n Feature Extraction n n Segment (isolate) fishes

Pattern Classification System n Preprocessing n n Feature Extraction n n Segment (isolate) fishes from one another and from the background Reduce the data by measuring certain features Classification n Divide the feature space into decision regions

Classification n Initially use the length of the fish as a possible feature for

Classification n Initially use the length of the fish as a possible feature for discrimination

Feature Selection The length is a poor feature alone! Select the lightness as a

Feature Selection The length is a poor feature alone! Select the lightness as a possible feature

Threshold decision boundary and cost relationship n Move decision boundary toward smaller values of

Threshold decision boundary and cost relationship n Move decision boundary toward smaller values of lightness in order to minimize the cost (reduce the number of sea bass that are classified salmon!) Task of decision theory

Feature Vector n Adopt the lightness and add the width of the fish to

Feature Vector n Adopt the lightness and add the width of the fish to the feature vector Fish x. T = [x 1, x 2] Lightness Width

Straight line decision boundary

Straight line decision boundary

Features n n We might add other features that are not highly correlated with

Features n n We might add other features that are not highly correlated with the ones we already have. Be sure not to reduce the performance by adding “noisy features” Ideally, you might think the best decision boundary is the one that provides optimal performance on the training data (see the following figure)

Is this a good decision boundary?

Is this a good decision boundary?

Decision Boundary Choice n Our satisfaction is premature because the central aim of designing

Decision Boundary Choice n Our satisfaction is premature because the central aim of designing a classifier is to correctly classify new (test) input Issue of generalization!

Better decision boundary

Better decision boundary

Baysian Decision Theory n n n Pure statistical approach – parametric Assumes the underlying

Baysian Decision Theory n n n Pure statistical approach – parametric Assumes the underlying probability structures are known perfectly Makes theoretically optimal decisions

Non-parametric algorithm

Non-parametric algorithm

Pattern Recognition Stages n Sensing n n n Use of a transducer (camera or

Pattern Recognition Stages n Sensing n n n Use of a transducer (camera or microphone) PR system depends on the bandwidth, the resolution sensitivity distortion of the transducer Preprocessing n Segmentation and grouping - patterns should be well separated and not overlap

Pattern Recognition Stages (cont) n Feature extraction n Classification n n Discriminative features Invariant

Pattern Recognition Stages (cont) n Feature extraction n Classification n n Discriminative features Invariant features with respect to translation, rotation, and scale Use the feature vector provided by a feature extractor to assign the object to a category Post Processing n Exploit context-dependent information to improve performance

The Design Cycle n n n Data collection Feature Choice Model Choice Training Evaluation

The Design Cycle n n n Data collection Feature Choice Model Choice Training Evaluation Computational Complexity

Data Collection n How do we know when we have collected an adequately large

Data Collection n How do we know when we have collected an adequately large and representative set of examples for training and testing the system?

Choice of Features n n Depends on the characteristics of the problem domain Simple

Choice of Features n n Depends on the characteristics of the problem domain Simple to extract, invariant to irrelevant transformations, insensitive to noise

Model Choice n Unsatisfied with the performance of our fish classifier and want to

Model Choice n Unsatisfied with the performance of our fish classifier and want to jump to another class of model

Training n n Use data to determine the classifier (Many different procedures for training

Training n n Use data to determine the classifier (Many different procedures for training classifiers and choosing models)

Evaluation n n Measure the error rate (or performance) Possibly switch from one set

Evaluation n n Measure the error rate (or performance) Possibly switch from one set of features to another one

Computational Complexity n n What is the trade-off between computational ease and performance? How

Computational Complexity n n What is the trade-off between computational ease and performance? How does an algorithm scale as a function of the number of features, patterns, or categories?

Learning and Adaptation n Supervised learning n n A teacher provides a category label

Learning and Adaptation n Supervised learning n n A teacher provides a category label for each pattern in the training set Unsupervised learning n The system forms clusters or “natural groupings” of the unlabeled input patterns

Introductory example conclusion n Reader may be overwhelmed by the number, complexity, and magnitude

Introductory example conclusion n Reader may be overwhelmed by the number, complexity, and magnitude of the sub-problems of Pattern Recognition Many of these sub-problems can indeed be solved Many fascinating unsolved problems still remain

DPS Pattern Recognition Dissertations n Completed n n n n Visual systems – Rick

DPS Pattern Recognition Dissertations n Completed n n n n Visual systems – Rick Bassett, Sheb Bishop, Tom Lombardi, John Casarella Speech recognition – Jonathan Law Handwriting – Mary Manfredi Natural Language Processing – Bashir Ahmed Keystroke Biometric – Mary Curtin, Mary Villani, Mark Ritzmann, Robert Zack, John Stewart, Ned Bakelman Fundamental research areas – Kwang Lee, Carl Abrams, Ted Markowitz, Dmitry Nikelshpur In progress n Jonathan Leet, Amir Schur