Collaborative Filtering with Temporal Dynamics Yehuda Koren Research

We Know What You Ought To Be Watching This Summer Research

Collaborative filtering • Recommend items based on past transactions of users • Specific data characteristics are irrelevant – Domain-free – Can identify elusive aspects • Two popular approaches: – Matrix factorization – Neighborhood Research

Movie rating data Training data Test data user movie date score 1 21 5/7/02 1 1 62 1/6/05 ? 1 213 8/2/04 5 1 96 9/13/04 ? 2 345 3/6/01 4 2 7 8/18/05 ? 2 123 5/1/05 4 2 3 11/22/05 ? 2 768 7/15/02 3 3 47 6/13/02 ? 3 76 1/22/01 5 3 15 8/12/01 ? 4 45 8/3/00 4 4 41 9/1/00 ? 5 568 9/10/05 1 4 28 8/27/05 ? 5 342 3/5/03 2 5 93 4/4/05 ? 5 234 12/28/00 2 5 74 7/16/03 ? 6 76 8/11/02 5 6 69 2/14/04 ? 6 56 6/15/03 4 6 83 10/3/03 ? Research

Achievable RMSEs on the Netflix data Find better items Global average: 1. 1296 erroneous User average: 1. 0651 Movie average: 1. 0533 Personalization Cinematch: 0. 9514; baseline “Algorithmics” Static neighborhood: 0. 9002 Static factorization: 0. 8911 Time effects accurate Leader: 0. 8558; 10. 05% improvement Inherent noise: ? ? Research

Something Happened in Early 2004… Research 2004

Are movies getting better with time? Research

Multiple sources of temporal dynamics • Item-side effects: – Product perception and popularity are constantly changing – Seasonal patterns influence items’ popularity • User-side effects: – – Customers ever redefine their taste Transient, short-term bias; anchoring Drifting rating scale Change of rater within household Research

Temporal dynamics - challenges • Multiple sources: Both items and users are changing over time • Multiple targets: Each user/item forms a unique time series Scarce data per target • Inter-related targets: Signal needs to be shared among users – foundation of collaborative filtering cannot isolate multiple problems Common “concept drift” methodologies won’t hold. E. g. , underweighting older instances is unappealing Research

Basic matrix factorization model users 1 3 items 5 2 4 1 4 4 1 5 4 2 3 5 3 3 5 4 4 4 2 1 3 5 4 ~ 2 2 3 3 2 2 5 4 users items ~ . 1 -. 4 . 2 1. 1 -. 2 . 3 . 5 -2 -. 5 . 8 -. 4 . 3 1. 4 2. 4 -. 9 -. 5 . 6 . 5 -. 8 . 7 . 5 1. 4 . 3 -1 1. 4 2. 9 -. 7 1. 2 -. 1 1. 3 -. 2 . 3 . 5 2. 1 -. 4 . 6 1. 7 2. 4 . 9 -. 3 . 4 . 8 . 7 -. 6 . 1 1. 1 2. 1 . 3 -. 7 2. 1 -2 -1 . 7 . 3 A rank-3 SVD approximation Research

Estimate unknown ratings as inner-products of factors: users 1 3 items 5 2 4 2 5 4 ? 4 1 2 3 4 4 1 5 5 3 3 4 4 4 2 1 3 5 4 ~ 2 2 3 3 2 2 5 4 users items ~ . 1 -. 4 . 2 1. 1 -. 2 . 3 . 5 -2 -. 5 . 8 -. 4 . 3 1. 4 2. 4 -. 9 -. 5 . 6 . 5 -. 8 . 7 . 5 1. 4 . 3 -1 1. 4 2. 9 -. 7 1. 2 -. 1 1. 3 -. 2 . 3 . 5 2. 1 -. 4 . 6 1. 7 2. 4 . 9 -. 3 . 4 . 8 . 7 -. 6 . 1 1. 1 2. 1 . 3 -. 7 2. 1 -2 -1 . 7 . 3 A rank-3 SVD approximation Research

Estimate unknown ratings as inner-products of factors: users 1 3 items 5 2 4 4 2 3 5 3 3 4 5 4 2. 4 1 5 4 4 2 1 3 5 4 ~ 2 2 3 3 2 2 5 4 users items ~ . 1 -. 4 . 2 1. 1 -. 2 . 3 . 5 -2 -. 5 . 8 -. 4 . 3 1. 4 2. 4 -. 9 -. 5 . 6 . 5 -. 8 . 7 . 5 1. 4 . 3 -1 1. 4 2. 9 -. 7 1. 2 -. 1 1. 3 -. 2 . 3 . 5 2. 1 -. 4 . 6 1. 7 2. 4 . 9 -. 3 . 4 . 8 . 7 -. 6 . 1 1. 1 2. 1 . 3 -. 7 2. 1 -2 -1 . 7 . 3 A rank-3 SVD approximation Research

Matrix factorization model 1 3 5 2 4 1 4 4 1 5 3 4 2 3 5 4 3 4 4 2 1 3 5 ~ 2 2 2 3 4 5 . 1 -. 4 . 2 -. 5 . 6 . 5 -. 2 . 3 . 5 1. 1 2. 1 . 3 -. 7 2. 1 -2 -1 . 7 . 3 1. 1 -. 2 . 3 . 5 -2 -. 5 . 8 -. 4 . 3 1. 4 2. 4 -. 9 -. 8 . 7 . 5 1. 4 . 3 -1 1. 4 2. 9 -. 7 1. 2 -. 1 1. 3 2. 1 -. 4 . 6 1. 7 2. 4 . 9 -. 3 . 4 . 8 . 7 -. 6 . 1 Properties: • SVD isn’t defined when entries are unknown use specialized methods • Can easily overfit, sensitive to regularization • Need to separate main effects… Research

Baseline predictors • Mean rating: 3. 7 stars • The Sixth Sense is 0. 5 stars above avg • Joe rates 0. 2 stars below avg Baseline prediction: Joe will rate The Sixth Sense 4 stars No user-item interaction Research

Factor model correction • Both The Sixth Sense and Joe are placed high on the “Supernatural Thrillers” scale Adjusted estimate: Joe will rate The Sixth Sense 4. 5 stars Research

Matrix factorization with biases Baseline predictors: μ – global average bu – bias of u bi – bias of i User-item interaction: pu – user u‘s factors qi – item i‘s factors Minimization problem: regularization Research

Addressing temporal dynamics • Factor model conveniently allows separately treating different aspects • We observe changes in: 1. Rating scale of individual users 2. Popularity of individual items 3. User preferences Research Baseline predictors User factors

Parameterizing the model • Use functional forms: bu(t)=f(u, t), bi(t)=g(i, t), pu(t)=h(u, t) • Need to find adequate f(), g(), h() • General guidelines: – – Items show slower temporal changes Users exhibit frequent and sudden changes Factors –pu(t)– are expensive to model Gain flexibility by heavily parameterizing the functions Research

Achievable RMSEs on the Netflix data Find better items Global average: 1. 1296 erroneous User average: 1. 0651 Movie average: 1. 0533 Personalization Cinematch: 0. 9514; baseline “Algorithmics” Static neighborhood: 0. 9002 Static factorization: 0. 8911 Time effects Dynamic factorization: 0. 8794 Grand Prize: 0. 8563; 10% improvement Inherent noise: ? ? Research accurate

Neighborhood-based CF • Earliest and most common collaborative filtering method • Derive unknown ratings from those of “similar” items (item-item variant) Research

Neighborhood modeling Use item-item weights - wij - to relate items: Need to estimate rating of user u for item i Baseline predictor Set of items rated by u constants Research Deviation from baseline estimate for item j Weight from j to i learned from the data through optimization

Optimizing the model Minimize the squared error function: Research

Making the model time-aware • A popular scheme – instance weighting: decay the significance of outdated events within cost function: time decay Don’t do this! Research

Why instance weighting isn’t suitable? • Not enough data per user – need to exploit all signal, including old one • The learnt parameters – wij – represent time invariant item-item relations. Can be also deduced from older actions. • Two items are related when users rated them similarly within a short timeframe, even if this happened long ago • How to do it right? Research

Time-aware neighborhood model • Decay item-item relations based on time distance • User-specific decay rate; controlled by βu • All past user behavior is equally considered, through cost function: Research

Global average: 1. 1296 Find better items erroneous User average: 1. 0651 Movie average: 1. 0533 Personalization Cinematch: 0. 9514; baseline “Algorithmics” Static neighborhood: 0. 9002 Static factorization: 0. 8911 Time effects Dynamic neighborhood: 0. 8885 Dynamic factorization: 0. 8794 accurate Grand Prize: 0. 8563; 10% improvement Inherent noise: ? ? Temporal neighborhood model delivers same relative RMSE improvement (0. 0117) as temporal factor model (!) Research

Lessons • Modeling temporal effects is significant in improving recommenders accuracy • Allow multiple time drifting patterns across users and items • Integrate all users within a single model to allow crucial cross -user collaboration • Model user behavior along full history, do not over-emphasize recent actions • Separate long term values, while excluding transient fluctuations from the model • Sudden, single-day effects are significant • Modeling past temporal fluctuations helps in predicting future behavior, even though we do not extrapolate future temporal dynamics Research

1 3 5 5 2 1 4 4 4 4 3 5 4 1 2 3 2 2 5 5 4 4 4 3 5 3 4 3 2 2 3 4 34 3 3 2 3 5 2 2 4 4 1 5 4 Yehuda Koren Yahoo! Research yehuda@yahoo-inc. com Research