ECE 5424 Introduction to Machine Learning Topics Finish
- Slides: 24
ECE 5424: Introduction to Machine Learning Topics: – (Finish) Nearest Neighbor Readings: Barber 14 (k. NN) Stefan Lee Virginia Tech
Administrative • HW 1 – – – – (C) Dhruv Batra Out now on Scholar Due on Wednesday 09/14, 11: 55 pm Please please start early Implement K-NN Kaggle Competition Bonus points for best performing entries. Bonus points for beating the instructor/TA. 2
Recap from last time (C) Dhruv Batra 3
Nearest Neighbours (C) Dhruv Batra Image Credit: Wikipedia 4
Instance/Memory-based Learning Four things make a memory based learner: • A distance metric • How many nearby neighbors to look at? • A weighting function (optional) • How to fit with the local points? (C) Dhruv Batra Slide Credit: Carlos Guestrin 5
1 -Nearest Neighbour Four things make a memory based learner: • A distance metric – Euclidean (and others) • How many nearby neighbors to look at? – 1 • A weighting function (optional) – unused • How to fit with the local points? – Just predict the same output as the nearest neighbour. (C) Dhruv Batra Slide Credit: Carlos Guestrin 6
1 -NN for Regression He the re, th da clos is is tap es oin t t H th ere, da e cl this ta ose is po s int t is s st i h , t ose int e er cl o H he tap t da y Here, this is the closest datapoint x (C) Dhruv Batra Figure Credit: Carlos Guestrin 7
Multivariate distance metrics Suppose the input vectors x 1, x 2, …x. N are two dimensional: x 1 = ( x 11 , x 12 ) , x 2 = ( x 21 , x 22 ) , …x. N = ( x. N 1 , x. N 2 ). One can draw the nearest-neighbor regions in input space. Dist(xi, xj) = (xi 1 – xj 1)2 + (xi 2 – xj 2)2 Dist(xi, xj) =(xi 1 – xj 1)2+(3 xi 2 – 3 xj 2)2 The relative scalings in the distance metric affect region shapes Slide Credit: Carlos Guestrin
Euclidean distance metric Or equivalently, where A Slide Credit: Carlos Guestrin
Notable distance metrics (and their level sets) Scaled Euclidian (L 2) Mahalanobis (non-diagonal A) Slide Credit: Carlos Guestrin
Minkowski distance (C) Dhruv Batra Image Credit: By Waldir (Based on File: Minkowski. Circles. svg) [CC BY-SA 3. 0 (http: //creativecommons. org/licenses/by-sa/3. 0)], via Wikimedia Commons 11
Notable distance metrics (and their level sets) Scaled Euclidian (L 2) L 1 norm (absolute) Mahalanobis (non-diagonal A) Slide Credit: Carlos Guestrin Linf (max) norm
Plan for today • (Finish) Nearest Neighbour – Kernel Classification/Regression – Curse of Dimensionality (C) Dhruv Batra 13
1 -NN for Regression • Often bumpy (overfits) (C) Dhruv Batra Figure Credit: Andrew Moore 14
9 -NN for Regression • Often bumpy (overfits) (C) Dhruv Batra Figure Credit: Andrew Moore 15
Weighted k-NNs • Neighbors are not all the same
Kernel Regression/Classification Four things make a memory based learner: • A distance metric – Euclidean (and others) • How many nearby neighbors to look at? – All of them • A weighting function (optional) – wi = exp(-d(xi, query)2 / σ2) – Nearby points to the query are weighted strongly, far points weakly. The σ parameter is the Kernel Width. Very important. • How to fit with the local points? – Predict the weighted average of the outputs predict = Σwiyi / Σwi (C) Dhruv Batra Slide Credit: Carlos Guestrin 17
Weighting/Kernel functions wi = exp(-d(xi, query)2 / σ2) (Our examples use Gaussian) (C) Dhruv Batra Slide Credit: Carlos Guestrin 18
Effect of Kernel Width • What happens as σ inf? • What happens as σ 0? (C) Dhruv Batra Image Credit: Ben Taskar 19
Problems with Instance-Based Learning • Expensive – No Learning: most real work done during testing – For every test sample, must search through all dataset – very slow! – Must use tricks like approximate nearest neighbour search • Doesn’t work well when large number of irrelevant features – Distances overwhelmed by noisy features • Curse of Dimensionality – Distances become meaningless in high dimensions – (See proof in next lecture) (C) Dhruv Batra 20
Curse of Dimensionality • Consider: Sphere of radius 1 in d-dims • Consider: an outer ε-shell in this sphere • What is (C) Dhruv Batra ? 21
Curse of Dimensionality (C) Dhruv Batra 22
What you need to know • k-NN – Simplest learning algorithm – With sufficient data, very hard to beat “strawman” approach – Picking k? • Kernel regression/classification – Set k to n (number of data points) and chose kernel width – Smoother than k-NN • Problems with k-NN – Curse of dimensionality – Irrelevant features often killers for instance-based approaches – Slow NN search: Must remember (very large) dataset for prediction (C) Dhruv Batra 23
What you need to know • Key Concepts (which we will meet again) – – – – (C) Dhruv Batra Supervised Learning Classification/Regression Loss Functions Statistical Estimation Training vs Testing Data Hypothesis Class Overfitting, Generalization 24
- Hima lakkaraju
- Concept learning task in machine learning
- Analytical learning in machine learning
- Pac learning model in machine learning
- Pac learning model in machine learning
- Inductive and analytical learning
- Analytical learning in machine learning
- Instance based learning in machine learning
- Inductive learning machine learning
- First order rule learning in machine learning
- Eager learning algorithm example
- Deep learning vs machine learning
- Ethem alpaydin
- What is unsupervised learning algorithm
- Andrew ng introduction to machine learning
- High bias low variance introduction to machine learning
- Machine learning ethem
- Cmu machine learning
- Introduction to machine learning slides
- Introduction to machine learning and data mining
- Lstm lecture
- Machine learning definition andrew ng
- Ethem alpaydin
- Machine learning ethem alpaydin
- Introduction to machine learning ethem alpaydin