Improving Response Prediction for Dyadic Data Nik Tuzov

Dyadic Data • Means that a certain “response value” is associated with a pair

Unsupervised learning • Example: Collaborative filtering (Movie. Lens project) Movie User • • 1

Co-clustering with Bregman differences • K*L rectangular clusters – direct products of row/column clusters

Co-clustering with Bregman differences (example from http: //videolectures. net/kdd 07_agarwal_pdlfm/)

Data: Movie. Lens • 20603 ratings, 346 users, 966 movies • From 1 to

Number of nodes? • 40 nodes appear enough (produce similar overfitting)

Results Logistic regression Neural network PDLF-Logistic PDLF-Neural N/A 4 * 4 6 * 6

New Covariates? Sample movies from the cluster with delta = -0. 57 : Title

Is Neural Network useful? • Gain in ROC area depends on the order: extra

Related Work • What if we want to predict response on (Web page, Search

Additional Info • To obtain a detailed report and Matlab code, please visit my

Slides: 17

Download presentation

Improving Response Prediction for Dyadic Data Nik Tuzov April 2008 http: //www. stat. purdue. edu/~ntuzov/

Dyadic Data • Means that a certain “response value” is associated with a pair of objects Applications: • Social networks • Internet advertising • Recommendation systems

Unsupervised learning • Example: Collaborative filtering (Movie. Lens project) Movie User • • 1 2 3 4 5 1 A B C D A 2 A B C C A 3 A B C X? A 4 Y? B 5 Movie 1 is “similar” to 5, hence Y is likely “B” Users 1, 2, 3 are “similar” to each other, hence X is likely “C” or “D”

Co-clustering with Bregman differences • K*L rectangular clusters – direct products of row/column clusters

Co-clustering with Bregman differences (example from http: //videolectures. net/kdd 07_agarwal_pdlfm/)

PDLF-GLM Model (Agarwal & Merugu’ 07)

Neural Network as alternative to GLM

Algorithm

Data: Movie. Lens • 20603 ratings, 346 users, 966 movies • From 1 to 198 ratings per movie, 32 to 105 ratings per user. • 50 covariates for each (user, movie) pair • 5700 observations held out for validation • Using area under Receiver Operating Characteristic (ROC) curve to measure performance

Neural Network Topology

Number of nodes? • 40 nodes appear enough (produce similar overfitting)

Results Logistic regression Neural network PDLF-Logistic PDLF-Neural N/A 4 * 4 6 * 6 3 * 4 Hidden nodes 1 40 40 40 0. 62 0. 6742 0. 6913 0. 7128 0. 6919 0. 708 N/A 2022 1913 5184 1847 N/A 274 412 5 709 N/A 0. 25 0. 13 0. 23 0. 02 N/A -0. 4 -0. 57 -0. 36 -0. 62 Clusters Validation ROC Max. cluster size Min cluster size Max delta Min delta

New Covariates? Sample movies from the cluster with delta = -0. 57 : Title Release date Nosferatu (Nosferatu, eine Symphonie des Grauens) (1922) 1 -Jan-22 Blue Angel, The (Blaue Engel, Der) (1930) 1 -Jan-30 Pinocchio (1940) 1 -Jan-40 Dial M for Murder (1954) 1 -Jan-54 8 1/2 (1963) 1 -Jan-63 Carrie (1976) 1 -Jan-76 Top Gun (1986) 1 -Jan-86 Bram Stoker's Dracula (1992) 1 -Jan-92 Mortal Kombat: Annihilation (1997) 1 -Jan-97 Sphere (1998) 13 -Feb-98 • 756 ratings; 23 females and 55 males; No documentaries

Contribution to ROC

Is Neural Network useful? • Gain in ROC area depends on the order: extra linear features (n/network) are added first => gain from coclustering is reduced • The opposite is also true • Hence, info in linear features is similar to that in clusters, so • For this dataset, n/network is not so helpful, but… • For other dyadic datasets, n/network can be a lot more useful

Related Work • What if we want to predict response on (Web page, Search query, Web user) ? • B. Long, X. Wu, Z. Zhang, and P. S. Yu. Unsupervised learning on k-partite graphs. In KDD, 2006.

Additional Info • To obtain a detailed report and Matlab code, please visit my website: http: //www. stat. purdue. edu/~ntuzov/ • The project is posted in “Software skills / Matlab” section • Questions? Contact me on ntuzov@purdue. edu