CSI 5388 Topics in Machine Learning Performance Evaluation

  • Slides: 15
Download presentation
CSI 5388: Topics in Machine Learning: Performance Evaluation for Classification Objectives of the Course

CSI 5388: Topics in Machine Learning: Performance Evaluation for Classification Objectives of the Course and Preliminaries Course Webpage (including Syllabus): http: //www. site. uottawa. ca/~nat/Courses/csi 5388_2008. html 1

Some Information n n n Instructor: Dr. Nathalie Japkowicz Office: STE 5 -029 Phone

Some Information n n n Instructor: Dr. Nathalie Japkowicz Office: STE 5 -029 Phone Number: 562 -5800 x 6693 (don’t rely on it!) E-mail: nat@site. uottawa. ca (best way to contact me!) Office Hours: Monday, Wednesday 1: 00 pm-2: 00 pm or by appointment Extra Seminars: TAMALE Seminars, (invited talks on Machine Learning and Natural Language Processing) 2

Why a Course on Performance Evaluation for Machine Learning? 3

Why a Course on Performance Evaluation for Machine Learning? 3

The Issue n Designing any kind of algorithms without testing it properly is useless:

The Issue n Designing any kind of algorithms without testing it properly is useless: • How do we know that what the algorithm is doing has anything to do with what we expect it to do? n Performance evaluation for Machine Learning is particularly difficult: • Machine Learning is based on the principle of induction Even if the algorithm does well on the data it was trained on, how do we know that it generalizes to data it hasn’t seen before? • Since data is scarce, we don’t know what kind of data the algorithm will see in the future. How can we, then, test any Machine Learning Algorithm? 4

Current Typical Solution n n n Select a large number of UCI domains Train

Current Typical Solution n n n Select a large number of UCI domains Train and test your new algorithm and those used for comparison using 10 -fold cross-validation, on each domain. Use accuracy to evaluate performance. Average the results of the 10 folds for each algorithm on each domain. Run paired t-tests to make sure that the differences observed between the various algorithms are significant. If they are, claim victory, and publish happily! If not, play around with your algorithm, and run your experiments again, until you get satisfaction. Oh, and also, average the results obtained for each algorithm on every domain, to get a global idea of how your algorithm performs. 5

Is this solution acceptable? n n I don’t know… It makes some sort of

Is this solution acceptable? n n I don’t know… It makes some sort of intuitive sense, but I never truly investigated it. I simply made sure to apply it rigorously after my professor told me it was important to do so! However, various parts of the procedure have been challenged by researchers and practitioners inside and outside of the field of Machine Learning. 6

What are Some of the issues that came up? n n n Basing our

What are Some of the issues that came up? n n n Basing our decisions on the metrics we typically use (e. g. , Accuracy) is problematic. The data sets we use are not necessarily representative of those our algorithms will be applied to. The statistics we rely on is misleading: • We may break the assumptions made by the statistical tests we use; • Our sampling strategies may be inappropriate. As a result of all these issues, our results may not mean what we believe they do. 7

Purpose of the Course n The purpose of the course is three-fold: • First,

Purpose of the Course n The purpose of the course is three-fold: • First, it consists of gathering enough background knowledge in statistics, on metrics, and on experimental evaluation, in general, to understand the nature of the criticisms. • Second, it consists of presenting some of the solutions that have been suggested to remedy the problems. • Third, it consists of training the students to think critically about evaluation issues so as to design their own solutions. 8

Useful Reading Material Machine Learning References n n n Machine Learning, Tom Mitchell, Mc.

Useful Reading Material Machine Learning References n n n Machine Learning, Tom Mitchell, Mc. Graw Hill, 1997. Data Mining, Witten, I & Frank, E. , Morgan-Kaufmann, 2006. Introduction to Machine Learning, Nils J. Nilsson (available (free) from the Web) Statistics Reference n Stat. Soft, http: //www. statsoft. com/textbook/stathome. html Research Papers n See the papers listed on the Web site 9

Objectives of the Courses: n n To introduce advanced topics in Machine Learning evaluation.

Objectives of the Courses: n n To introduce advanced topics in Machine Learning evaluation. To introduce the students to the careful reading, presenting and critiquing of individual research papers. To introduce the students to background research in a subfield of Machine Learning evaluation: finding appropriate sources (some giving broad overviews, others describing the most important approaches in the subfield), organizing the knowledge logically, presenting the knowledge to the class. To initiate the students to formulate a research 10 problem and carrying this research through.

Format of the Course: n n n A mixture of lectures and student presentations

Format of the Course: n n n A mixture of lectures and student presentations on various topics in the area of classifier evaluation. The student presentations will review and criticize recent specialized research papers written on the more general topic previously introduced by the lecturer. The last week of the term will be devoted to student project presentations. 11

Course Requirements: n n 6 paper critiques in which the student will critically and

Course Requirements: n n 6 paper critiques in which the student will critically and comparatively discuss the content of 2 or 3 research papers on the weekly theme. 3 critical and comparative in-class presentation of a research paper 3 assignments Final Project: - Project Proposal - Project Report - Project Presentation Percent of the Final Grade 12% 18% 30% 40% 12

List of Topics n n n Review of Machine Learning's main concepts Current approaches

List of Topics n n n Review of Machine Learning's main concepts Current approaches for the evaluation of Machine Learning and their shortcomings Evaluation Metrics I: ROC Analysis / Cost Curves. Evaluation Metrics II: Non-Traditional Metrics, Evaluation through Projection, Combination of metrics Functional Elements of Statistics for Machine Learning 1 & 2 13

List of Topics (Cont’d) n n Sampling, Bootstrapping, Randomized Methods Putting it all together:

List of Topics (Cont’d) n n Sampling, Bootstrapping, Randomized Methods Putting it all together: Error Estimation of Machine Learning Algorithms Data Sets: Pros and cons of data repositories, simulated data sets Model selection, Theory, Practical Recommendations 14

Project (See Project Description on Course Web site) n n Research Project including a

Project (See Project Description on Course Web site) n n Research Project including a literature review and the design and implementation of a novel evaluation schemes, or the comparison of various evaluation schemes. Projects Proposal (3 -5 pages) are due the week after the break. Project Report are due on the last day of classes Project Presentations will take place on the last week of classes Start thinking about the project early!!!!! 15