Discrimination and Classification Discrimination Classification Discrimination Goal is

Discrimination & Classification • Discrimination: Goal is to separate individual cases into known population

Notation and Concepts • Notation: Populations ≡ p 1, p 2 Measured Variables: X

Setting up a Discriminant Function • Prior Probabilities for the 2 Populations – Assumes

Regions that Minimize Expected Cost of Misclassification

Allocation of New Observation x 0 to Population

Classification of Multivariate Normal Populations when S 1 ≠ S 2

Jacknife Cross-Validation (Lauchenbruch’s Holdout Method) • For Population 1, remove each observation 1 -at-time

Slides: 13

Download presentation

Discrimination and Classification

Discrimination & Classification • Discrimination: Goal is to separate individual cases into known population groups by their measurements on several variables. § Makes use of graphical and algebraic methods. § Goal is to separate groups as much as possible based on numeric values § Referred to as “Separation” • Classification: Observing new cases along with numeric values and assigning them to groups based on their numeric values § Makes use of an algorithm generated on known cases and applies it to new cases whose population is unknown. § Referred to as “Allocation”

Notation and Concepts • Notation: Populations ≡ p 1, p 2 Measured Variables: X • Conceptual Settings of Unknown Population: § Incomplete Knowledge of Outcome: The outcome is in future and cannot be observed when X is measured § Destruction Necessary to Observe Outcome: A product must be destroyed to observe quality status. § Unavailable or Expensive Assessments of Outcome: Authorship unknown or assessment by expensive gold standard may be needed

Setting up a Discriminant Function • Prior Probabilities for the 2 Populations – Assumes knowledge of relative population sizes. Will tend to classify individual cases into the “larger” population unless strong evidence in favor of “smaller” population. • Misclassification Cost – Is cost of misclassification same for objects from each of the populations? • Probability Density Functions – The distributions of the numeric variables for the elements of the 2 populations. Population 1: f 1(x) Population 2: f 2(x) • Classification Regions – Given an observations’ x values, it will be assigned to Population 1 or 2, R 1 ≡ {x} s. t. an observation is classified to Population 1, R 2 ≡ W – R 1 is the set of x where it is classified to Population 2

Mathematical Notation

Regions that Minimize Expected Cost of Misclassification

Allocation of New Observation x 0 to Population

Normal Populations with Equal S

Sample Based Discrimination

Fisher’s Method for 2 Populations

Classification of Multivariate Normal Populations when S 1 ≠ S 2

Evaluation of Classification Functions

Jacknife Cross-Validation (Lauchenbruch’s Holdout Method) • For Population 1, remove each observation 1 -at-time and fit the classifier based on all (n 1 -1)+n 2 remaining cases. Classify the hold-out case. • Repeat for all n 1 cases from Population 1. n 1 m(H) ≡ # misclassified as p 2 • Repeat for all n 2 cases from Population 2. n 2 m(H) ≡ # misclassified as p 1