MemoryBased Reasoning PASTA Lab POSTECH PASTA IE POSTECH

Memory-Based Reasoning 이재현 PASTA Lab. POSTECH PASTA IE POSTECH

1. Introduction Memory-Based Reasoning(MBR) is n Identifying similar cases from experience n Applying the information from these cases to the problem at hand. n MBR finds neighbors similar to a new record and uses the neighbors for classification and prediction. It cares about the existence of two operations n Distance function ; assigns a distance between any two records n Combination function ; combines the results from the neighbors to arrive at an answer. Applications of MBR span many areas; n Fraud detection n Customer response prediction n Medical treatments n Classifying responses PASTA IE POSTECH

2. How does MBR work? What is the most likely movie last seen by a respondent based on the source of the record and the age of the individual? MBR has two distinct phases n The learning phase generates the historical database n The prediction phase applies MBR to new cases PASTA IE POSTECH

2. 1. The three main issues in solving a problem with MBR Choosing the appropriate set of historical records n The historical records, also known as the training set, is a subset of available records. n The training set needs to provide good coverage of the records so that the nearest neighbors to an unknown record are useful for predictive purposes. Representing the historical records n The performance of MBR in making predictions depends on how the training set is represented in the computer. Determining the distance function, Combination function, and number of neighbors n The distance function, combination function, and number of neighbors are the key ingredients in determining how good MBR is at producing results. PASTA IE POSTECH

3. Case study ; Classifying News Stories What are the codes? n News provider assigns codes to news stories in order to describe the content of the stories. These codes help users search for stories of interest. Applying MBR n Choosing the training set The training set consisted of 49, 652 news stories n Choosing the Distance function In this case, a distance function already existed, based on a notion called relevance feedback that measures the similarity of two documents based on the words they contain. PASTA IE POSTECH

3. Case study ; Classifying News Stories Relevance Feedback function Choosing the combination function The combination function used a weighted summation technique. n Choosing the number of neighbors The investigation varied the number of nearest neighbors between 1 and 11 inclusive. n PASTA IE POSTECH

3. Case study ; Classifying News Stories The result n Recall and precision are two measurements that are useful when measuring how well a set of codes get assigned. Recall ; “How many of the correct codes did MBR assign to the story? ” Precision ; “How many of the codes assigned by MBR were correct? ” Codes by MBR Correct codes Recall Precision A, B, C, D 50% 100% A, B, C, D, E, F, G, H A, B, C, D 100% 50% Category Recall Precision Government 85% 87% Industry 91% 85% Market Sector 93% 91% Product 69% 89% Region 86% 64% Subject 72% 53% PASTA IE POSTECH

4. Measuring Distance Three most common distance functions n Absolute value of the difference ; |A-B| n Square of the difference ; (A-B)2 n Normalized absolute value |A-B|/(maximum difference) Example Recnum Gender Age Salary 1 Female 27 $ 19, 000 2 Male 51 $ 64, 000 3 Male 52 $105, 000 4 Female 33 $ 55, 000 5 Male 45 $ 48, 000 Gender Dgender(female, female) = 0, Dgender(male, female) = 1 Dgender(female, male) = 1, Dgender(male, male) = 0 n PASTA IE POSTECH

4. Measuring Distance n Age 27 51 52 33 45 27 0. 00 0. 96 1. 00 0. 24 0. 72 51 0. 96 0. 00 0. 04 0. 72 0. 24 52 1. 00 0. 04 0. 00 0. 76 0. 28 33 0. 24 0. 72 0. 76 0. 00 0. 48 45 0. 72 0. 24 0. 28 0. 48 0. 00 Merge into a single record distance function. Summation ; dsum(A, B) = dgender(A, B) + dage(A, B) + dsalary(A, B) Normalized summation ; dnorm(A, B) = dsum(A, B)/max(dsum) Euclidean distance ; deuclid(A, B) = sqrt(dgender(A, B)2 + dage(A, B)2 + dsalaty(A, B)2) n PASTA IE POSTECH

4. Measuring Distance Set of nearest neighbors for three distance functions dsum dnorm deuclid 1 1, 4, 5, 2, 3 2 2, 5, 3, 4, 1 3 3, 2, 5, 4, 1 4 4, 1, 5, 2, 3 5 5, 2, 3, 4, 1 Insert new customer n Gender ; Female, Age ; 45, Salary ; $100, 000 Set of nearest neighbor for new customer 1 2 3 4 5 neighbors dsum 1. 662 1. 659 1. 338 1. 003 1. 640 4, 3, 5, 2, 1 dnorm 0. 554 0. 553 0. 446 0. 334 0. 547 4, 3, 5, 2, 1 deuclid 0. 781 1. 052 1. 251 0. 494 1. 000 4, 1, 5, 2, 3 PASTA IE POSTECH

5. The combination function ; Asking the neighbors for the answer The basic approach ; Democracy n The basic combination function used for MBR is to have the K nearest neighbors vote on the answer-”democracy” in data mining. n Customers with Attrition History Recnum Gender Age Salary Attriter 1 Female 27 $ 19, 000 No 2 Male 51 $ 64, 000 Yes 3 Male 52 $105, 000 Yes 4 Female 33 $ 55, 000 Yes 5 Male 45 $ 48, 000 No new Female 45 $100, 000 ? PASTA IE POSTECH

5. The combination function ; Asking the neighbors for the answer n n Using MBR to determining if the new customer will attrite Neighbors Neighbor Attrition K=1 K=2 K=3 K=4 K=5 dsum 4, 3, 5, 2, 1 Y, Y, N yes yes yes deuclid 4, 1, 5, 2, 3 Y, N, N, Y, Y yes ? no ? yes Attrition prediction with confidence K=1 K=2 K=3 K=4 K=5 dsum Yes, 100% Yes, 67% Yes, 75% Yes, 60% deuclid Yes, 100% Yes, 50% Yes, 67% Yes, 50% Yes, 60% PASTA IE POSTECH

5. The combination function ; Asking the neighbors for the answer Weighted voting n Weighted voting is similar to voting except that the neighbors are not all created equal n Closer neighbors have stronger votes than neighbors farther away do. n The size of the vote is inversely proportional to the distance from the new record. n To prevent problems when the distance might be 0, it is common to add 1 to the distance before taking the inverse. n Attrition prediction with weighted voting K=1 K=2 K=3 K=4 K=5 dnorm 0. 749 to 0 1. 441 to 0. 647 2. 085 to 1. 290 deuclid 0. 669 to 0. 562 0. 669 to 1. 062 1. 157 to 1. 062 1. 601 to 1. 062 n Confidence with weighted voting K=1 K=2 K=3 K=4 K=5 dnorm Yes, 100% Yes, 69% Yes, 76% Yes, 62% deuclid Yes, 100% Yes, 54% Yes, 61% Yes, 52% Yes, 60% PASTA IE POSTECH

6. Conclusion Strengths of Memory-Based Reasoning n It produces results that are readily understandable. n It is applicable to arbitrary data types, even non-relational data. n It works efficiently on almost any number of fields. n Maintaining the training set requires a minimal amount of effort. Weaknesses of Memory-Based Reasoning n It is computationally expensive when doing classification and prediction. n It requires a large amount of storage for the training set. n Results can be dependent on the choice of distance function, combination function, and number of neighbors PASTA IE POSTECH