Lecture Slides for INTRODUCTION TO Machine Learning ETHEM

  • Slides: 25
Download presentation
Lecture Slides for INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004

Lecture Slides for INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 [email protected] edu. tr http: //www. cmpe. boun. edu. tr/~ethem/i 2 ml Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V 1. 1)

CHAPTER 5: Multivariate Methods Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning

CHAPTER 5: Multivariate Methods Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V 1. 1)

Multivariate Data n n n Multiple measurements (sensors) d inputs/features/attributes: d-variate N instances/observations/examples 3

Multivariate Data n n n Multiple measurements (sensors) d inputs/features/attributes: d-variate N instances/observations/examples 3 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V 1. 1)

Multivariate Parameters 4 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning ©

Multivariate Parameters 4 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V 1. 1)

Parameter Estimation 5 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning ©

Parameter Estimation 5 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V 1. 1)

Estimation of Missing Values n n What to do if certain instances have missing

Estimation of Missing Values n n What to do if certain instances have missing attributes? Ignore those instances: not a good idea if the sample is small Use ‘missing’ as an attribute: may give information Imputation: Fill in the missing value Mean imputation: Use the most likely value (e. g. , mean) ¨ Imputation by regression: Predict based on other attributes ¨ 6 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V 1. 1)

Multivariate Normal Distribution 7 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning

Multivariate Normal Distribution 7 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V 1. 1)

Multivariate Normal Distribution n n Mahalanobis distance: (x – μ)T ∑– 1 (x –

Multivariate Normal Distribution n n Mahalanobis distance: (x – μ)T ∑– 1 (x – μ) measures the distance from x to μ in terms of ∑ (normalizes for difference in variances and correlations) Bivariate: d = 2 8 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V 1. 1)

Bivariate Normal 9 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning ©

Bivariate Normal 9 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V 1. 1)

10 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT

10 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V 1. 1)

Independent Inputs: Naive Bayes n If xi are independent, offdiagonals of ∑ are 0,

Independent Inputs: Naive Bayes n If xi are independent, offdiagonals of ∑ are 0, Mahalanobis distance reduces to weighted (by 1/σi) Euclidean distance: n If variances are also equal, reduces to Euclidean distance 11 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V 1. 1)

Parametric Classification n If p (x | Ci ) ~ N ( μi ,

Parametric Classification n If p (x | Ci ) ~ N ( μi , ∑i ) n Discriminant functions are 12 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V 1. 1)

Estimation of Parameters 13 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning

Estimation of Parameters 13 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V 1. 1)

Different Si n Quadratic discriminant 14 Lecture Notes for E Alpaydın 2004 Introduction to

Different Si n Quadratic discriminant 14 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V 1. 1)

likelihoods discriminant: P (C 1|x ) = 0. 5 posterior for C 1 15

likelihoods discriminant: P (C 1|x ) = 0. 5 posterior for C 1 15 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V 1. 1)

Common Covariance Matrix S n Shared common sample covariance S n Discriminant reduces to

Common Covariance Matrix S n Shared common sample covariance S n Discriminant reduces to which is a linear discriminant 16 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V 1. 1)

Common Covariance Matrix S 17 Lecture Notes for E Alpaydın 2004 Introduction to Machine

Common Covariance Matrix S 17 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V 1. 1)

Diagonal S n When xj j = 1, . . d, are independent, ∑

Diagonal S n When xj j = 1, . . d, are independent, ∑ is diagonal p (x|Ci) = ∏j p (xj |Ci) (Naive Bayes’ assumption) Classify based on weighted Euclidean distance (in sj units) to the nearest mean 18 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V 1. 1)

Diagonal S variances may be different 19 Lecture Notes for E Alpaydın 2004 Introduction

Diagonal S variances may be different 19 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V 1. 1)

Diagonal S, equal variances n Nearest mean classifier: Classify based on Euclidean distance to

Diagonal S, equal variances n Nearest mean classifier: Classify based on Euclidean distance to the nearest mean n Each mean can be considered a prototype or template and this is template matching 20 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V 1. 1)

Diagonal S, equal variances *? 21 Lecture Notes for E Alpaydın 2004 Introduction to

Diagonal S, equal variances *? 21 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V 1. 1)

Model Selection Assumption Covariance matrix No of parameters Shared, Hyperspheric Si=S=s 2 I 1

Model Selection Assumption Covariance matrix No of parameters Shared, Hyperspheric Si=S=s 2 I 1 Shared, Axis-aligned Si=S, with sij=0 d Shared, Hyperellipsoidal Si=S Different, Hyperellipsoidal Si n n d(d+1)/2 K d(d+1)/2 As we increase complexity (less restricted S), bias decreases and variance increases Assume simple models (allow some bias) to control variance (regularization) 22 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V 1. 1)

Discrete Features n Binary features: if xj are independent (Naive Bayes’) the discriminant is

Discrete Features n Binary features: if xj are independent (Naive Bayes’) the discriminant is linear Estimated parameters 23 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V 1. 1)

Discrete Features n Multinomial (1 -of-nj) features: xj Î {v 1, v 2, .

Discrete Features n Multinomial (1 -of-nj) features: xj Î {v 1, v 2, . . . , vn } j if xj are independent 24 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V 1. 1)

Multivariate Regression n Multivariate linear model n Multivariate polynomial model: Define new higher-order variables

Multivariate Regression n Multivariate linear model n Multivariate polynomial model: Define new higher-order variables z 1=x 1, z 2=x 2, z 3=x 12, z 4=x 22, z 5=x 1 x 2 and use the linear model in this new z space (basis functions, kernel trick, SVM: Chapter 10) 25 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V 1. 1)