Introduction Machine Learning 14022017 Machine Learning How can

  • Slides: 35
Download presentation
Introduction Machine Learning 14/02/2017

Introduction Machine Learning 14/02/2017

Machine Learning How can we design a computer system whose performance improves by learning

Machine Learning How can we design a computer system whose performance improves by learning from experience?

Spam filtering

Spam filtering

Face/person recognition demo

Face/person recognition demo

Recommendation systems

Recommendation systems

Robotics

Robotics

Natural Language Processing

Natural Language Processing

other application areas – Biometrics – Object recognition on images – DNA seqencing –

other application areas – Biometrics – Object recognition on images – DNA seqencing – Financial data mining/prediction – Process mining and optimisation Pattern Classification, Chapter 1

Big Data

Big Data

Rule-based systems vs. Machine learning • Domain expert is needed for – writing rules

Rule-based systems vs. Machine learning • Domain expert is needed for – writing rules OR – giving training sample • Which one is better? – Can the expert design rule-based systems? – Is the problem specific or general? 10

http: //www. ml-class. org/course

http: //www. ml-class. org/course

Most of the materials in these slides were taken from Pattern Classification (2 nd

Most of the materials in these slides were taken from Pattern Classification (2 nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000 with the permission of the authors and the publisher

14 Definition Machine Learning (Mitchell): „a computer program said to learn from experience E

14 Definition Machine Learning (Mitchell): „a computer program said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. ”

15 Example Classify fishes see bass Classes salmon Goal: to learn a modell from

15 Example Classify fishes see bass Classes salmon Goal: to learn a modell from training data which can categorise fishes (eg. salmons are shorter) Pattern Classification, Chapter 1

16 Classification(T) – Supervised learning: Based on training examples (E), learn a modell which

16 Classification(T) – Supervised learning: Based on training examples (E), learn a modell which works fine on previously unseen examples. – Classification: a supervised learning task of categorisation of entities into predefined set of classes Pattern Classification, Chapter 1

17 Pattern Classification, Chapter 1

17 Pattern Classification, Chapter 1

Basic definitions Feature (or attribute) Instance (or entity, sample) ID Length (cm) Lightness Type

Basic definitions Feature (or attribute) Instance (or entity, sample) ID Length (cm) Lightness Type 1 28 0. 5 salmon 2 23 0. 7 salmon 3 17 0. 5 sea bass Class label

19 Example Preprocessing – Image processing steps • E. g segmentation of fish contour

19 Example Preprocessing – Image processing steps • E. g segmentation of fish contour and background – Feature extraction • Extraction of features/attributes from images which are atomic variables • Typically numerical or categorical Pattern Classification, Chapter 1

20 Example features • • • length lightness width number of paddles position of

20 Example features • • • length lightness width number of paddles position of mouth Pattern Classification, Chapter 1

21 Length is a weak discriminator of fish types. Pattern Classification, Chapter 1

21 Length is a weak discriminator of fish types. Pattern Classification, Chapter 1

22 Lightness is better Pattern Classification, Chapter 1

22 Lightness is better Pattern Classification, Chapter 1

23 Performance evaluation (P) – most simple: accuracy (correct rate) – False positive/negative errors

23 Performance evaluation (P) – most simple: accuracy (correct rate) – False positive/negative errors – E. g. if the threshold is decreased the number of sea basses falsly classified to salmon decreases Decision theory Pattern Classification, Chapter 1

24 Feature vector A vector of features describing a particular instance. Instance. A x.

24 Feature vector A vector of features describing a particular instance. Instance. A x. T = [x 1, x 2] Lightness Width Pattern Classification, Chapter 1

25 Pattern Classification, Chapter 1

25 Pattern Classification, Chapter 1

26 Feature space Be careful by adding to many features – noisy features (eg.

26 Feature space Be careful by adding to many features – noisy features (eg. measurement errors) – Unnecessary (pl. information content is similar to other feature) We need features which might have discriminative power. Feature set engineering is highly taskspecific! Pattern Classification, Chapter 1

27 This is not ideal. Remember supervised learning principle! Pattern Classification, Chapter 1

27 This is not ideal. Remember supervised learning principle! Pattern Classification, Chapter 1

28 Pattern Classification, Chapter 1

28 Pattern Classification, Chapter 1

29 Modell selection • Number of features? • Complexity of the task? • Classifier

29 Modell selection • Number of features? • Complexity of the task? • Classifier speed? • Task and data-dependent! Pattern Classification, Chapter 1

The machine learning lifecycle • • • 30 Data preparation Feature engineering Modell selection

The machine learning lifecycle • • • 30 Data preparation Feature engineering Modell selection Modell training Performance evaluation Pattern Classification, Chapter 1

31 Data preparation Do we know whether we collected enough and representative sample for

31 Data preparation Do we know whether we collected enough and representative sample for training a system? Pattern Classification, Chapter 1

32 Modell selection and training – These topics are the foci of this course

32 Modell selection and training – These topics are the foci of this course – Investigate the data for modell selection! No free lunch! Pattern Classification, Chapter 1

33 Performance evaluation • There are various evaluation metrics • Simulation of supervised learning:

33 Performance evaluation • There are various evaluation metrics • Simulation of supervised learning: 1. split your data into two parts 2. train your modell on the training set 3. predict and evaluate your modell on the test set (unknow during training) Pattern Classification, Chapter 1

34 Topics of the course • • Classification Regression Clustering Recommendation systems Learning to

34 Topics of the course • • Classification Regression Clustering Recommendation systems Learning to rank Structure prediction Reinforcement learning

https: //www. kaggle. com/competitions http: //www 195. pair. com/mik 3 hall/weka_kaggle. html

https: //www. kaggle. com/competitions http: //www 195. pair. com/mik 3 hall/weka_kaggle. html