Hierarchical Probabilistic Relational Models for Collaborative Filtering Jack
Hierarchical Probabilistic Relational Models for Collaborative Filtering Jack Newton (newton@cs. ualberta. ca) and Russ Greiner (greiner@cs. ualberta. ca) Introduction, Problem Set Up Introduction • Personalized Recommender Systems recommend specific products • For example: Amazon. com’s book recommender; Yahoo!’s LAUNCHcast music recommender • Very popular! • We designed/built a recommender system – “tadpole” – using • Probabilistic Relational Models (PRMs) [KP 98] • Hierarchical PRMs (h. PRMs) [Get 02] Our Approach Experiments and Results PRMs A PRM encodes • class-Level dependencies • used to make inferences about a particular instance of a class. Learning Must first learn PRM from the data [FGK 99] • algorithm for learning a legal structure for a PRM • estimating parameters for that PRM. Person The Each. Movie Dataset • Often used to test recommender systems • 72, 916 users, 1, 628 movies, 2, 811, 983 votes Composed of three tables: • Person: describes people; fields: age, gender, zip code, … • Movie: describes movies; fields: genre, … • Vote: user’s rating on movie; {0, 1, 2, 3, 4, 5} Person Movie applied to Each. Movie dataset Recommender Systems Action-Vote Score Gender Comedy Action Romantic. Comedy-Vote Thriller Score Gender Romantic Comedy Slapstick. Comedy-Vote Slapstick Comedy • Content-based recommenders Score • use only facts about products and individual (potential) purchaser 25, Male, Calif, … , Action, Budget, … , 4 lists facts about a person, facts about a movie, a vote {1, . . , 5} • Use dataset to learn a classifier, that predicts vote for novel person/movie pairs. Figure 1: A standard PRM Inference Given PRM encoding the class-level dependencies, • Generate a Ground Bayesian Network for each specific object • Use same structure/parameters for each instance of class • Use standard Bayesian Network inference algorithm • Different results for child nodes as different data for parents, … • Collaborative Filtering-based recommenders • base recommendations on ratings other “similar” users have assigned to similar products. Figure 3 a: A class hierarchy Vote • Eg: a movie recommender system: just People Movies database • Each tuple Thriller-Vote Score John. Age and P 2 liked X, perhaps P 1 will like X. • Our goal: a cohesive framework for combining all types of information: • properties of product • properties of user, • voting patterns of all users, • As well voting patterns of a given user to make accurate recommendations. • Probabilistic Relational Models, and an extension to PRMs called Hierarchical PRMs (h. PRMs), offer a probabilistic framework we can apply to the recommender system problem domain [Get 02]. • Evaluation: applying the PRM framework to the Each. Movie dataset. AVG Romantic. Comedy-Movie Video Status Theater Status Slapstick. Comedy-Movie Theater Status Video Status Thriller-Movie Theater Status Video Status Figure 3 b: An h. PRM learned on the Each. Movie dataset Results • Compared to Correlation (CR), Bayesian Clustering (BC), Bayesian Network (BN), Vector Similarity (VSIM) as presented in [BHK 98] • Metric: Mean Absolute Error (MAE) [BHK 98] • 5 -fold cross-validation Star. Wars. Theater. Status Algorithm John. Education Star. Wars. Video. Status CR BC BN VSIM PRM John. Gender • If person P 1 appears similar to person P 2 (perhaps based on their previous “liked movies”) Theater Status Video Status AVG Education Video Status Action-Movie Score Education Theater Status Age Vote. John. On. Star. Wars. Score Figure 2: A ground Bayesian Network Hierarchical PRMs Two limitation of PRMs (which motivate h. PRMs): 1. Vote. Score can depend on attributes of related objects, • such as Person. Age, but Vote. Score can NOT depend on itself in any way. BAD: want John’s Vote on Star Wars to help predict • John’s Vote on T 3 • Fred’s Vote on Star Wars • … (Why? PRM’s class-level dependency structure must be DAG) 2. Restricted to one dependency graph for Vote. Score • However, you could may want one dependency graph for movies of the Comedy genre, and another for the Action genre h. PRMs [Get 02] address both problems: h. PRMs use a class hierarchy such as that in Figure 3 a, to learn the h. PRM in Figure 3 b: Absolute Deviation 1. 257 1. 127 1. 143 2. 113 1. 26 Algorithm CR BC BN VSIM h. PRM Absolute Deviation 0. 994 1. 103 1. 066 2. 136 1. 060 Contributions • Built PRM and h. PRM models – learning, inference algorithms • Show that (h)PRMs can apply to recommender systems in general • Evaluated in context of Each. Movie database, demonstrated competitive results against existing algorithms • Demonstrate superiority of h. PRMs over standard PRMs. Acknowledgements Lise Getoor, for useful discussion, encouragement to pursue this line of work, and access to software and data that aided us in building our tadpole system. Alberta Ingenuity, NSERC, and i. CORE for funding. References [BHK 98] John S. Breese, David Heckerman, and Carl Kadie. Empirical analysis of predictive algorithms for collaborative filtering. In UAI 98, pages 43– 52, 1998. [FGKP 99] Nir Friedman, Lise Getoor, Daphne Koller, and Avi Pfeffer. Learning probabilistic relational models. In IJCAI-99, pages 1300– 1309, 1999. [Get 02] L. Getoor. Learning Statistical Models from Relational Data. Ph. D thesis, Stanford University, 2002. [KP 98] D. Koller and A. Pfeffer. Probabilistic frame based systems. In AAAI-98, pages 580– 587, Madison, WI, 1998.
- Slides: 1