MRI Meaningful Interpretations of Collaborative Ratings Mahashweta Das
MRI: Meaningful Interpretations of Collaborative Ratings Mahashweta Das Gautam Das Sihem Amer-Yahia Cong Yu 37 th International Conference on Very Large Data Bases, 2011 @ Seattle VLDB 2011
Roadmap n n n Introduction q Motivation q Problem: MRI n Sub problem: DEM n Sub problem: DIM Data Model Algorithms Experiments q Quantitative q Qualitative Conclusion & Future Work VLDB 2011 2
Roadmap n n n Introduction q Motivation q Problem: MRI n Sub problem: DEM n Sub problem: DIM Data Model Algorithms Experiments q Quantitative q Qualitative Conclusion & Future Work VLDB 2011 3
Motivation VLDB 2011 4
Motivation VLDB 2011 5
Motivation VLDB 2011 6
Motivation n n Examining reviews vs. trusting overall aggregate rating IMDB ratings demographic breakdown not meaningful enough VLDB 2011 7
MRI Problem n n Examining reviews vs. trusting overall aggregate rating IMDB ratings demographic breakdown not meaningful enough q Novel and powerful third option: Meaningful Rating Interpretation n VLDB 2011 Explain ratings by leveraging user and item attribute information 8
MRI Problem n n Examining reviews vs. trusting overall aggregate rating IMDB ratings demographic breakdown not meaningful enough q Novel and powerful third option: Meaningful Rating Interpretation n n VLDB 2011 Explain ratings by leveraging user and item attribute information Example: 9
MRI Problem n n Examining reviews vs. trusting overall aggregate rating IMDB ratings demographic breakdown not meaningful enough q Novel and powerful third option: Meaningful Rating Interpretation n n VLDB 2011 Explain ratings by leveraging user and item attribute information Example: 10
MRI Sub-problem n DEM: Meaningful Description Mining q Identify groups of reviewers who consistently share similar ratings on items VLDB 2011 11
MRI Sub-problem n DEM: Meaningful Description Mining q Identify groups of reviewers who consistently share similar ratings on items VLDB 2011 12
MRI Sub-problem n DIM: Meaningful Difference Mining q Identify groups of reviewers who consistently disagree on item ratings VLDB 2011 13
MRI Sub-problem n DIM: Meaningful Difference Mining q Identify groups of reviewers who consistently disagree on item ratings VLDB 2011 14
Roadmap n n n Introduction q Motivation q Problem: MRI n Sub problem: DEM n Sub problem: DIM Data Model Algorithms Experiments q Quantitative q Qualitative Conclusion & Future Work VLDB 2011 15
Data Model n Collaborative rating site: <Set of Items, Set of Users, Ratings> q Rating tuple: <item attributes, user attributes, rating> ID Title Genre Director Name Gender Location Rating 1 Titanic Drama James Cameron Amy Female New York 8. 5 2 Schindler’s List Drama Steven Speilberg John Male New York 7. 0 n Group: Set of ratings describable by a set of attribute values n Notion of group based on data cube q OLAP literature for mining multidimensional data VLDB 2011 16
Data Model n Notion of group based on data cube lattice Each node in lattice is a data cube/cuboid Query condition on database Figure: 4 -Dimensional Data Cube Lattice VLDB 2011 17
Data Model n Notion of group based on data cube lattice Each node in lattice is a data cube/cuboid Query condition on database A = Gender B = Age C = Location D = Occupation Figure: 4 -Dimensional Data Cube Lattice VLDB 2011 18
Data Model Each node/data cube/ cuboid in lattice is a group Selection Query Condition A = Gender: Male B = Age: Young C = Location: CA D = Occupation: Student Figure: Partial Rating Lattice for a Movie (M: Male, Y: Young, CA: California, S: Student) VLDB 2011 19
Data Model Each node/data cube/ cuboid in lattice is a group Selection Query Condition A = Gender: Male B = Age: Young C = Location: CA D = Occupation: Student Figure: Partial Rating Lattice for a Movie (M: Male, Y: Young, CA: California, S: Student) VLDB 2011 20
Data Model Task Quickly indentify “good” groups in the lattice that help users understand ratings effectively Figure: Partial Rating Lattice for a Movie (M: Male, Y: Young, CA: California, S: Student) VLDB 2011 21
Roadmap n n n Introduction q Motivation q Problem: MRI n Sub problem: DEM n Sub problem: DIM Data Model Algorithms Experiments q Quantitative q Qualitative Conclusion & Future Work VLDB 2011 22
DEM: Meaningful Description Mining n For an input item covering RI ratings, return set C of cuboids, such that: q description error is minimized, subject to: n |C| ≤ k; n coverage ≥a Description Error Measures how well a cuboid average rating approximates the numerical score of each individual rating belonging to it Coverage Measures the percentage of ratings covered by the returned cuboids n DEM is NP-Hard: Proof details in paper VLDB 2011 23
DEM Algorithms n Exact Algorithm (E-DEM) q Brute-force enumerating all possible combinations of cuboids in lattice to return the exact (i. e. , optimal) set as rating descriptions n Random Restart Hill Climbing Algorithm q Often fails to satisfy Coverage constraint; Large number of restarts required q Need an algorithm that optimizes both Coverage and Description Error constraints simultaneously n Randomized Hill Exploration Algorithm (RHE-DEM) VLDB 2011 24
RHE-DEM Algorithm Satisfy Coverage Minimize Error C= {Male, Student} {California, Student} Figure: Partial Rating Lattice for a Movie; k=2, a=80% (M: Male, Y: Young, CA: California, S: Student) VLDB 2011 25
RHE-DEM Algorithm Satisfy Coverage Minimize Error C= {Male, Student} {California, Student} Say, C does not satisfy Coverage Constraint Figure: Partial Rating Lattice for a Movie; k=2, a=80% (M: Male, Y: Young, CA: California, S: Student) VLDB 2011 26
RHE-DEM Algorithm Satisfy Coverage Minimize Error C= {Male, Student} {California, Student} C= {Male} {California, Student} C= {Student} {California, Student} Figure: Partial Rating Lattice for a Movie; k=2, a=80% (M: Male, Y: Young, CA: California, S: Student) VLDB 2011 27
RHE-DEM Algorithm Satisfy Coverage √ Minimize Error C= {Male} {California, Student} Say, C satisfies Coverage Constraint Figure: Partial Rating Lattice for a Movie; k=2, a=80% (M: Male, Y: Young, CA: California, S: Student) VLDB 2011 28
RHE-DEM Algorithm Satisfy Coverage √ Minimize Error C= {Male} {California, Student} Figure: Partial Rating Lattice for a Movie; k=2, a=80% (M: Male, Y: Young, CA: California, S: Student) VLDB 2011 29
RHE-DEM Algorithm Satisfy Coverage √ Minimize Error C= {Male} {California, Student} Figure: Partial Rating Lattice for a Movie; k=2, a=80% (M: Male, Y: Young, CA: California, S: Student) VLDB 2011 30
RHE-DEM Algorithm Satisfy Coverage √ Minimize Error √ C= {Male} {Student} Figure: Partial Rating Lattice for a Movie; k=2, a=80% (M: Male, Y: Young, CA: California, S: Student) VLDB 2011 31
DIM: Meaningful Difference Mining n For an input item covering RI+ RI- ratings, return set C of cuboids, such that: q difference balance is minimized, subject to: n |C| ≤ k; n ≥a∩ ≥a Difference Balance Measures whether the positive and negative ratings are “mingled together" (high balance) or “separated apart" (low balance) Coverage Measures the percentage of +, - ratings covered by the returned cuboids n DIM is NP-Hard: Proof details in paper VLDB 2011 32
DIM Algorithms n Exact Algorithm (E-DIM) n Randomized Hill Exploration Algorithm (RHE-DIM) q q Unlike DEM “error”, DIM “balance” computation is expensive n Quadratic computation scanning all possible positive and negative ratings for each set of cuboids Introduce the concept of Fundamental Regions to aid faster balance computation n Partition space of all ratings and aggregate rating tuples in each region VLDB 2011 33
DIM Algorithms: Fundamental Region C 1 = {Male, Student} C 2 = {California, Student} Balance = Figure: Computing Balance using Fundamental Region Set of k=2 cuboids having 75 ratings (44+, 31 -), 10 ratings (6+, 4 -) VLDB 2011 34
Roadmap n n n Introduction q Motivation q Problem: MRI n Sub problem: DEM n Sub problem: DIM Data Model Algorithms Experiments q Quantitative q Qualitative Conclusion & Future Work VLDB 2011 35
Experiments n q Dataset Movie. Lens: 100, 000 ratings for 1682 movies by 943 users Each user has 4 attributes: Gender, Age, Occupation, Location Binning the movies: Order movies according to number of ratings and then partition into 6 bins n n q q Bin 1: movies with fewest ratings, Bin 6: movies with highest ratings Evaluation Quantitative Indicator: Efficiency, Quality and Scalability Qualitative Indicator: Mechanical Turk User Study VLDB 2011 36
Quantitative Experiments: DEM VLDB 2011 37
Quantitative Experiments: DEM VLDB 2011 38
Qualitative Experiments: User Study n Amazon Mechanical Turk study q Two sets: one for description mining, one for difference mining q Each set: 4 randomly chosen movies, 30 independent singleuser tasks q q Study 1: Users prefer simple aggregate ratings over rating interpretations Study 2: Users prefer rating interpretations by exact algorithm or heuristic randomized hill exploration algorithm VLDB 2011 39
Qualitative Experiments: User Study VLDB 2011 40
Roadmap n n n Introduction q Motivation q Problem: MRI n Sub problem: DEM n Sub problem: DIM Data Model Algorithms Experiments q Quantitative q Qualitative Conclusion & Future Work VLDB 2011 41
Conclusion and Future Work n Novel problem of meaningful rating interpretation (MRI) in collaborative rating sites q Meaningful Description Mining q Meaningful Difference Mining n Heuristic algorithmic solutions that generate equally good rating interpretations as exact brute-force with much less execution time n Meaningful interpretations of ratings by reviewers of interest n Additional constraints such as diversity of rating explanations VLDB 2011 42
Related Work n Data Cubes q q q n Clustering & Dimensionality Reduction q n Gray et. al, A relational aggregation operator generalizing group-by, cross-tab, and sub-totals, ICDE 1996 Sathe et. al, Intelligent rollups in multidimensional olap data, VLDB 2001 Lakshmanan et. al, Quotient cube: how to summarize the semantics of a data cube, VLDB 2002 Ramakrishnan et. al, Exploratory mining in cube space, ICDM 2006 Wu et. al, Promotion analysis in multi-dimensional space, VLDB 2009 Agrawal et. al, Automatic subspace clustering of high dimensional data for data mining applications, SIGMOD 1998 Recommendation Explanation q q Herlocker et. al, Explaining collaborative filtering recommendations, CSCW 2000 Bilgic et. al, Explaining recommendations: Satisfaction vs. promotion, IUI 2005 VLDB 2011 43
Thank You Questions VLDB 2011
Quantitative Experiments: DIM VLDB 2011 45
Quantitative Experiments: DEM, DIM VLDB 2011 46
- Slides: 46