Application of Dimensionality Reduction in Recommender SystemsA Case















- Slides: 15
Application of Dimensionality Reduction in Recommender Systems--A Case Study Badrul M. Sarwar, George Karypis, Joseph A. Konstan, and John T. Riedl Group. Lens Research Group Department of Computer Science and Engineering University of Minnesota
Talk Outline l Introduction to Recommender Systems (RS) l Challenges l Dimensionality Reduction as a Solution l Experimental Setup and Results l Conclusion
Recommender Systems l Problem – Information Overload – Too Many Product Choices l Solution – Recommender Systems (RS) l Collaborative Filtering
Collaborative Filtering Target Customer Representation of input data l Neighborhood formation l Prediction/Top-N recommendation l
Challenges of products RS l Scalability Customers – Enormous size of customer-product matrix – Slow neighborhood search – Slow prediction generation l Sparsity – May hide good neighbors – Results in poor quality and reduced coverage
Challenges of RS l Synonymy – Similar products treated differently – Increases sparsity, loss of transitivity – Results in poor quality l Example – C 1 rates recycled letter pads High – C 2 rates recycled memo pads High èBoth of them like Recycled office products
Idea: Dimensionality Reduction l Latent Semantic Indexing – Used by the IR community for document similarity – Works well with similar vector space model – Uses Singular Value Decomposition (SVD) l Main Idea – Term-document matching in feature space – Captures latent association – Reduced space is less-noisy
SVD: Mathematical Background = R R k m. Xn UUk m X kr SSk rk X rk V’ V k’ kr X n The reconstructed matrix Rk = Uk. Sk. Vk’ is the closest rank-k matrix to the original matrix R.
SVD for Collaborative Filtering 1. Low dimensional representation O(m+n) storage requirement kxn mxk mxn mxm similarity 3. Neighborhood Formation . 2. Direct Prediction • Top-N Recommendation • Prediction (CF algorithm)
Experimental Setup l Data Sets – Movie. Lens data (www. movielens. umn. edu) l l l 943 users, 1, 682 items 100, 000 ratings on 1 -5 Likert scale Used for prediction and neighborhood experiments – E-commerce data l l l 6, 502 users, 23, 554 items 97, 045 purchases Used for neighborhood experiment – Train and test portions l Percentage of training data, x
Experimental Setup l Benchmark Systems – CF-Predict – CF-Recommend l Metrics – Prediction l Mean Absolute Error (MAE) – Top-N Recommendation l l Recall and Precision Combined score F 1
Results: Prediction Experiment l Movie data l Used SVD for prediction generation based on the train data l Computed MAE l Obtained similar numbers from CF-predict
Results: Neighborhood Formation l Movie Dataset (converted to binary) l Used SVD for dimensionality reduction l Formed neighborhood in the reduced space l Used neighbors to produce recommendations l Computed F 1 l Obtained similar numbers from CF-Recommend
Results: Neighborhood Formation l E-Commerce Dataset l Used SVD for dimensionality reduction l Formed neighborhood in the reduced space l Used neighbors to produce recommendations l Computed F 1 l Obtained similar numbers from CF-Recommend
Conclusion l SVD results are promising – Provides better Recommendations for Movie data l Provides better Predictions for x<0. 5 – Not as good for the E-Commerce data l Even up to 700 dimensions! SVD provides better online performance l SVD is capable of meeting RS challenges l – Sparsity – Scalability – Synonymy l A follow-up paper appears at EC’ 00 conference