Collaborative Filtering Rong Jin Department of Computer Science

Collaborative Filtering Rong Jin Department of Computer Science and Engineering Michigan State University 1

Outline o o Brief introduction information filtering Collaborative filtering n n Major issues in collaborative filtering Main methods for collaborative filtering Flexible mixture model for collaborative filtering Decoupling model for collaborative filtering 2

Short vs. Long Term Info. Need o Short-term information need (Ad hoc retrieval) n n o “Temporary need”, e. g. , info about used cars Information source is relatively static User “pulls” information Application example: library search, Web search Long-term information need (Filtering) n n “Stable need”, e. g. , new data mining algorithms Information source is dynamic System “pushes” information to user Applications: news filter 3

Examples of Information Filtering o o o News filtering Email filtering Movie/book/product recommenders Literature recommenders And many others … 4

Information Filtering o Basic filtering question: Will user U like item X? o Two different ways of answering it Look at what U likes characterize X content-based filtering n Look at who likes X characterize U collaborative filtering n o Combine filtering content-based filtering and collaborative 5

Other Names for Information Filtering o Content-based filtering is also called n n o “Adaptive Information Filtering” in TREC “Selective Dissemination of Information” (SDI) in Library & Information Science Collaborative filtering is also called n Recommender systems 6

Example: Content-based Filtering History What to Recommend? Description: A homicide detective and a Description: A high-school boy fire marshall must stop a pair of murderers who commit videotaped crimes to become media darlings is given the chance to write a story about an up-and-coming rock band as he accompanies it on their concert tour. Rating: Description: A biography of sports legend, Muhammad Ali, from his early days to his days in the ring Rating: Description: Benjamin Martin is drawn into the American revolutionary war against his will when a brutal British commander kills his son. Rating: Recommend: ? No Description: A young adventurer named Milo Thatch joins an intrepid group of explorers to find the mysterious lost continent of Atlantis. Recommend: ? Yes 7

Example: Collaborative Filtering User 1 1 5 3 4 3 User 2 User 3 4 2 1 5? 5 3 2 5 5 4 User 3 is more similar to user 1 than user 2 5 for movie “ 15 minutes” for user 3 8

Collaborative Filtering (CF) vs. Content-based Filtering (CBF) o o CF do not need content of items while CBF relies the content of items CF is useful when content of items n n are not available or difficult to acquire are brief and insufficient o o Example: movie recommendation A movie is preferred may because n n n its actor its director its popularity 9

Application of Collaborative Filtering 10

Collaborative Filtering o Goal: Making filtering decisions for an individual user based on the judgments of other users Objects: O Users: U u 1 u 2 … um utest o 1 o 2 … oj+1… on oj 3 ? 1 4 2 2 3 5 ? 4 ? 1 3 3 2 …. … 4…… 1 ? ? 11

Collaborative Filtering o Goal: Making filtering decisions for an individual user based on the judgments of other users o General idea n Given a user u, find similar users {u 1, …, um} n Predict u’s rating based on the ratings of u 1, …, um 12

Example: Collaborative Filtering User 1 1 5 3 4 3 User 2 User 3 4 2 1 5? 5 3 2 5 5 4 User 3 is more similar to user 2 than user 1 5 for movie “ 15 minutes” for user 3 13

Memory-based Approaches for CF o o The key is to find users that are similar to the test user Traditional approach n Measure the similarity in rating patterns between different users n Example: Pearson Correlation Coefficient 14

Pearson Correlation Coefficient for CF o Similarity between a training user y and a test user y 0: Remove the rating bias from each training user 15

Pearson Correlation Coefficient for CF o Estimate ratings for the test user Weighted vote of normalized rates 16

Example User 1 1 5 3 4 1 5 2 ? 3 5 4 Normalized Rate User 2 Normalized Rate User 3 Normalize Rate 17

Example User 1 1 5 3 4 3 Normalized Rate -2. 2 1. 8 -0. 2 0. 8 -0. 2 User 2 4 1 5 2 5 Normalized Rate 0. 6 -2. 4 1. 6 -1. 4 1. 6 User 3 2 ? 3 5 4 Normalize Rate -1. 5 -0. 5 1. 5 0. 5 18

Example User 1 1 5 3 4 3 Normalized Rate -2. 2 1. 8 -0. 2 0. 8 -0. 2 User 2 4 1 5 2 5 Normalized Rate 0. 6 -2. 4 1. 6 -1. 4 1. 6 User 3 2 ? 3 5 4 Normalize Rate -1. 5 -0. 5 1. 5 0. 85 -0. 49 19

Problems with Memory-based Approaches o User 1 ? 5 3 4 2 User 2 4 1 5 ? 5 User 3 5 ? 4 2 5 User 4 1 5 3 5 ? Most users only rate a few items n Two similar users can may not rate the same set of items Clustering users and items 20

Flexible Mixture Model (FMM) Cluster both users and items simultaneously User 1 ? 5 3 4 2 User 2 4 1 5 ? 5 User 3 5 ? 4 2 5 User 4 1 5 3 5 ? User clustering and item clustering are correlated ! 21

Flexible Mixture Model (FMM) Cluster both users and items simultaneously Movie Type II User Class I 1 User Class II p(4)=1/4 p(5)=3/4 p(1)=1/2 p(2)=1/2 Movie Type III 3 p(4)=1/2 p(5)=1/2 Unknown ratings are gone! 22

Flexible Mixture Model (FMM) P(o|Zo) P(Zu) P(Zo) P(u|Zu) Zu: user class Zo: item class U: user Zu Zo O: item O R U R: rating Hidden variable P(r|Zo, Zu) Observed variable 23

Flexible Mixture Model: Estimation o o Annealed Expectation Maximization (AEM) algorithm E-step: calculate posterior probability for hidden variables zu and Zo n o b: temperature for Annealed EM algorithm M-step: updated parameters 24

Flexible Mixture Model: Predication Key issue: What user class does the test user belong to ? o Fold-in process n n Repeat the EM algorithm including ratings from the test user Fix all the parameters except for P(ut|zu) 25

Another Prob. with Memory-based Approaches User 1 2 5 3 4 2 User 2 4 1 3 User 3 5 2 5 User 4 1 4 2 3 1 n Users with similar interests can have different rating patterns Decoupling preference patterns from rating patterns 26

Decoupling Model (DM) Zo O Zu U Zu: user class Zo: item class U: user O: item R: rating Hidden variable Observed variable 27

Decoupling Model (DM) Zo Zu Zpref O U Zu: user class Zo: item class U: user O: item R: rating Zpref : whether users like items Hidden variable Observed variable 28

Decoupling Model (DM) Zo ZR Zu Zpref O R U Hidden variable Observed variable o o Zu: user class Zo: item class U: user O: item R: rating Zpref : whether users like items ZR: rating class Separating preference and rating patterns User class + Rating class rating R n Zu Zpref and ZR +Zpref r 29

Experiment o Datasets: Each. Movie and Movie. Rating Number of Users 500 2000 Number of Items 1000 1682 Avg. # of rated items/User 87. 7 129. 6 5 6 Number of ratings o Each. Movie Evaluation: n Mean Absolute Error (MAE): average absolute deviation of the predicted ratings to the actual ratings on items. 30 n The smaller MAE, the better the performance

Experiment Protocol o Test the sensitivity of the proposed model to the amount of training data n n n o Vary the number of training users Movie. Rating dataset: 100 and 200 training users Each. Movie dataset: 200 and 400 training users Test the sensitivity of the proposed model to the information needed for the test user n Vary the number of rated items provided by the test user o 5, 10, and 20 items are given with ratings 31

Experimental Results: FMM and other baseline algorithms M A E A smaller MAE indicates better performance Given: 5 20 10 Movie Rating, 200 Training Users Movie Rating, 100 Training Users M A E Given: 5 20 10 Each Movie, 400 Training Users Each Movie, 200 Training Users 32

FMM vs. DM Smaller value indicates better performance Training Users Size 100 200 Training Users Size 200 400 Algorithms 5 Items Given 10 Items Given 20 Items Given FMM 0. 829 0. 822 0. 807 DM 0. 791 0774 0. 751 FMM 0. 800 0. 787 0. 768 DM 0. 770 0. 753 0. 730 Algorithms 5 Items Given 10 Items Given 20 Items Given FMM 1. 07 1. 04 1. 02 DM 1. 06 1. 02 1. 00 FMM 1. 05 1. 03 1. 01 DM 1. 04 1. 01 0. 99 Results on Movie Rating Results on Each Movie 33