Learning Recommender Systems with Adaptive Regularization Steffen Rendle

  • Slides: 24
Download presentation
Learning Recommender Systems with Adaptive Regularization Steffen Rendle WSDM 2012 Presenter: Haiqin Yang Date:

Learning Recommender Systems with Adaptive Regularization Steffen Rendle WSDM 2012 Presenter: Haiqin Yang Date: Mar. 21 2012

Outline n n n Introduction Factorization Machine with Adaptive Regularization Evaluation Conclusion More stories

Outline n n n Introduction Factorization Machine with Adaptive Regularization Evaluation Conclusion More stories

Collaborative Filtering n Predict unobserved entries based on partial observed matrix

Collaborative Filtering n Predict unobserved entries based on partial observed matrix

Overfitting n n Most state-of-the-art recommender methods have a large number of model parameters

Overfitting n n Most state-of-the-art recommender methods have a large number of model parameters and thus are prone to overfiting. Low Rank Approximation

Solution to Overfitting n Typically L 2 -regularization is applied to prevent overtting, e.

Solution to Overfitting n Typically L 2 -regularization is applied to prevent overtting, e. g. : q Maximum margin matrix factorization q Probabilistic matrix factorization

Regularization Parameters n A generalized formulation n The success depends largely on the choice

Regularization Parameters n A generalized formulation n The success depends largely on the choice of the value(s) for q q n If is chosen too small, the model overfits. If is chosen too small, the model underfits. Question: How to choose efficiently?

How to select parameters? n Validation Set Based Methods – Search for optimal values

How to select parameters? n Validation Set Based Methods – Search for optimal values using a withheld validation set q Grid search by cross validation

How to select parameters? n Validation Set Based Methods – Search for optimal values

How to select parameters? n Validation Set Based Methods – Search for optimal values using a withheld validation set q q q Grid search by cross validation Informed search: too complicated Regularization path: not common for all cases

How to select parameters? n Validation Set Based Methods – Search for optimal values

How to select parameters? n Validation Set Based Methods – Search for optimal values using a withheld validation set q q q n Grid search by cross validation Informed search: too complicated Regularization path: not common for all cases Hierarchical Bayesian Methods – Use a hierachical model with hyperpriors on prior distribution q Typically optimized with Markov Chain Monte Carlo (MCMC)

Factorization Machine (FM) n n Matrix Factorization (MF) Factorization Machine Model: Parameters:

Factorization Machine (FM) n n Matrix Factorization (MF) Factorization Machine Model: Parameters:

FM vs. Other Factorization Models n n FM Generalization q q q n MF

FM vs. Other Factorization Models n n FM Generalization q q q n MF SVD++ Pairwise Interaction Tensor Factorization (PITF) Factorization Personalized Markov Chains (FPMC) See “Factorization Machines” in ICDM 2010 An example: MF Let

Optimization and Algorithm n Optimization Target n Square loss n Gradient descent

Optimization and Algorithm n Optimization Target n Square loss n Gradient descent

Adaptive Regularization n n Split two datasets Find the regularization values * that lead

Adaptive Regularization n n Split two datasets Find the regularization values * that lead to the lowest error on the validation set Alternating optimization Problem: the right hand size is independent of

Adaptive Regularization n Hint: Next parameters depend on n Recall n n Objective n

Adaptive Regularization n Hint: Next parameters depend on n Recall n n Objective n Update rule Expansion

Adaptive Regularization n Update rule n Gradients

Adaptive Regularization n Update rule n Gradients

Evaluation n Datasets q q n Movielen 1 M Netflix Methods q q Stochastic

Evaluation n Datasets q q n Movielen 1 M Netflix Methods q q Stochastic Gradient Descent (SGD) SGD with Adaptive regularization (SGDA)

Accuracy vs. Latent Dimensions

Accuracy vs. Latent Dimensions

Convergence

Convergence

Evolution of n Flexible regularization is better than one regularization value for all dimensions

Evolution of n Flexible regularization is better than one regularization value for all dimensions

Size of Validation Set Sv n n The larger the validation set, the close

Size of Validation Set Sv n n The larger the validation set, the close to the test set Too larger validation set reduces training size, yielding poor performance

Conclusion n n An adaptive regularization method based on the Factorization Machine Systematical experiments

Conclusion n n An adaptive regularization method based on the Factorization Machine Systematical experiments to demonstrate the model performance

More Stories n Reformulate the problem to create a new model: Factorization Machine q

More Stories n Reformulate the problem to create a new model: Factorization Machine q q Factorization machines, ICDM 2010 Fast context-aware recommendations with factorization machines, SIGIR 2011 Learning recommender systems with adaptive regularization, WSDM 2012 Bayesian factorization machines, NIPS 2011 Workshop

More Stories n n Modify existing techniques for new models Predictor-Corrector

More Stories n n Modify existing techniques for new models Predictor-Corrector

Q&A

Q&A