Matching Users and Items Across Domains to Improve

Motivation 2 2 Lack of data is a serious concern in building a recommender

Problem Definition 3 Given: Two homogeneous rating matrices They model the same type of

Why This Problem Is Challenging 4 When item correspondence is known, the problem is

Basic Idea 5 low rank assumption and factorization models R 1 m 1 m

A Two-Stage Model to Find the Matching 6 M 1×N 1 ? O O

Stage 1: Latent Space Matching 7 7 1. Latent Space Matching

8 How can we perform SVD on a Partially Observed Matrix? = = 8

Matching in Latent Space 9 We want to solve G from Now we know

Solving 10 0 1 0 ≈ (M 1× K) 10 (M 1× M 2)

11 More accurate but harder to solve. � Obtain good initialization and reduced

Goals 12 1. Identify the user mapping and item mapping 1. Latent Space Matching

13 Transferring Imperfect Matching to Predict Ratings 13 Matched latent factors are constrained to

Experiment Setup 14 • Yahoo! Music Dataset items users 14 Disjoint Split Overlap Split

15 Accuracy and Mean Average Precision: The higher the better

Rating Prediction (Root Mean Square Error) 16 RMSE: the lower the better 16

Conclusion 18 It is possible to identify user or item correspondence unsupervisedly based on

Slides: 18

Download presentation

Matching Users and Items Across Domains to Improve the Recommendation Quality Chung-Yi Li, Shou-De Lin r 00922051@csie. ntu. edu. tw sdlin@csie. ntu. edu. tw Department of Computer Science and Information Engineering, National Taiwan University 1

Motivation 2 2 Lack of data is a serious concern in building a recommender system, in particular for newly established services. Can we leverage the information from other domains to improve the quality of a recommender system?

Problem Definition 3 Given: Two homogeneous rating matrices They model the same type of preference. Decent portion of overlap in users and in items. Challenge: The mapping of users is unknown, and so is the mapping of items. Goals: 1. Identify the user mapping and item mapping. 2. Use the identified mappings to boost the recommendation performance. 3 ♫ ♫ ♫ Target Rating Matrix Source Rating Matrix

Why This Problem Is Challenging 4 When item correspondence is known, the problem is much easier � Define user similarity. If the similarity is large, they are likely to be the same users. [Narayanan 2008] ♫ ♫ ♫ In our case, both sides are unknown no clear solution yet 4

Basic Idea 5 low rank assumption and factorization models R 1 m 1 m 2 m 3 m 4 m 5 5 n 4 n 3 n 2 n 1 = m 5 m 4 m 3 m 2 m 1 = n n 2 4 1 3 R 2 ?

A Two-Stage Model to Find the Matching 6 M 1×N 1 ? O O ? ≈ M 1×M 2 M 2×N 2 N 2×N 1 O ? ? O 1. Latent Space Matching Rough Matching Result 2. Matching Refinement 6 Final Matching Result

Stage 1: Latent Space Matching 7 7 1. Latent Space Matching

8 How can we perform SVD on a Partially Observed Matrix? = = 8 = 1. Latent Space Matching

Matching in Latent Space 9 We want to solve G from Now we know how to get Thus Since SVD is unique, we can separate user and item sides: Same subproblem 9 S: sign matrix (K by K, diagonal, -1 or 1) 1. Latent Space Matching

Solving 10 0 1 0 ≈ (M 1× K) 10 (M 1× M 2) (M 2× K) 1. Latent Space Matching

11 More accurate but harder to solve. � Obtain good initialization and reduced search space from latent space matching. Solve Guser and Gitem alternatingly. The objective value always decreases & converges. 11 1. Latent Space Matching Rough Matching Result 2. Matching Refinement Final Matching Result

Goals 12 1. Identify the user mapping and item mapping 1. Latent Space Matching Rough Matching Result 2. Matching Refinement Final Matching Result 2. Then, use the identified mappings to boost recommendation performance 12

13 Transferring Imperfect Matching to Predict Ratings 13 Matched latent factors are constrained to be similar

Experiment Setup 14 • Yahoo! Music Dataset items users 14 Disjoint Split Overlap Split Contained Split Subset Split training set of R 1 training set of R 2 Partial Split

15 Accuracy and Mean Average Precision: The higher the better

Rating Prediction (Root Mean Square Error) 16 RMSE: the lower the better 16

(root mean square error)

Conclusion 18 It is possible to identify user or item correspondence unsupervisedly based on homogeneous rating data Even with imperfect matching, out model can still improve the recommendation accuracy. Questions? 17