IBM Research On the Quality of Inferring Interests
IBM Research On the Quality of Inferring Interests From Social Neighbors Zhen Wen, Ching-Yung Lin IBM T. J. Watson Research Center © Copyright IBM Corporation 2010
IBM Research Motivation § Modeling user interests enables personalized services – More relevant search/recommendation results – More targeted advertising § Data about users are sparse – Many user profiles are static, incomplete and/or outdated – <10% employees actively participate social software [Brzozowski 2009] § Inferring user interests from neighbors can be a solution – Also bring up a concern of exposing user’s private information How true are “You are who you know”, “Birds of a Feather Flocks Together”? | 9/29/2020 © Copyright IBM Corporation 2010
IBM Research Challenges in Observing Users § Diverse types of media – Public social media (friending, blogs, etc. ) § Data are public but limited (esp. in enterprises) – Private communication media (email, instant messaging, face -to-face meetings, etc) § Much more data § Privacy is a major issue | 9/29/2020 © Copyright IBM Corporation 2010
IBM Research Example of Diverse Types of Media Number of people participated in top 3 media in an Enterprise with 400 K employees Number of entries: • Social bookmarking: • Electronic communication: • File sharing: | 9/29/2020 400 K 20 M 140 K © Copyright IBM Corporation 2010
IBM Research Our Goals § How well a user’s interests can be inferred from his/her social neighbors? § Can the diverse types of media be combined to improve inferring user interests from social neighbors? § Can the quality of the inference be predicted based of features of social neighbors? – Only sufficiently accurate inference may help personalized services | 9/29/2020 © Copyright IBM Corporation 2010
IBM Research Our Approach § Infer user interests from social neighbors – Model user interests based on multiple types of information they accessed – Construct employee social network from communication data – Infer using social influence model § Study the relationship between inference quality and network characteristics – Identify effective factors to ensure high quality results for applications | 9/29/2020 © Copyright IBM Corporation 2010
IBM Research Small. Blue: Unlock the Power of Business Networks & Protect Privacy Net: See how experts or community connect Expertise: Search for people who know “xyz” in my networks. . Social Network Analysis & Visualization, Expertise Mining, Ego: Show my personal network evolution and social capital and Multi-Channel Human Network/Behavior Analysis crawling Distributed DBs & Streams Feeds 20, 000 emails & Same. Time messages Live Data 1, 000 Learning click data 14, 000 Knowledge. View, Sales. One, …, access data 1, 000 Lotus Connections (blogs, flie sharing, bookmark) data 200, 000 people’s consulting financial databases 400, 000 IBMers organization/demographic data Reach: helps me to understand this person and my formal and information paths to Reach him. . 400, 000 webpages and knowledge assets Whisper: Social Network enabled personalized live recommender. . | 9/29/2020 Synergy: Personalized Search Productivity: Social Network Analysis Service helps company understand how to enhance productivity. © Copyright IBM Corporation 2010
IBM Research Privacy as Fundamental Human Rights and Global Privacy Laws (United Nations) Universal Declaration of Human Rights [1948] Article 12: No one shall be subjected to arbitrary interference with his privacy, family, home or correspondence, nor to attacks upon his honor and reputation. Everyone has the right to the protection of the law against such inference or attacks. EU Directive 95/46/EC Article 2 (a): European Union – Personal data shall mean § § • any information relating to an identified or identifiable natural person (January 2007) APEC • Guidelines (2004) – Personal data may be processed only if: | 9/29/2020 (1995) Russia • Federal law on Pers Data EU Directive 95/46/EC Article 7: § The data subject has unambiguously given his consent; or § for the performance of a contract to which the data subject is party or in order to take steps at the request of the data subject prior to entering into a contract; or § for compliance with a legal obligation to which the controller is subject; or… European Data Protection Directive Canada • PIPEDA (2001 - 2004) APEC • Guidelines (2004) Australia • Privacy Amendment Act (2001 U. S. – Sectoral • Children ’ s Privacy; COPPA (1999) • Financial Sector GLB (2001) • Health Sector; HIPAA (2002) • California Privacy; (2005) Existing Private Sector Privacy Laws Emerging Private Sector Privacy Laws Dubai • Data Protection Law (January 2007) Chile • Protection of Private Life Law (1999) Argentina • Protection of PD Law (2000) New Zealand • Privacy Act (1993) Taiwan • Computer - Processed PD Protection Law (1995) South Korea • Info & Comm Network Util. & Info Protection Law (2000) Japan • Personal Data Protection Act (2005) © Copyright IBM Corporation 2010
IBM Research Dataset § 25315 users’ contributed content – 20 M email/chats – 400 K social bookmarks – 20 K shared public files – Profile information § Job role, division, news categories of interests, etc § Infer social network based on email/chats X’: number of emails | 9/29/2020 © Copyright IBM Corporation 2010
IBM Research User Interests Model – Implicit Interests § Model users’ interests implicitly indicated by their contributed content – Extract latent topics from the multiple types of content using LDA – Select top-N distinct topics as the implicit interests model of a user The degree the user is interested | 9/29/2020 The similarity of topics © Copyright IBM Corporation 2010
IBM Research User Interests Model – Explicit Interests § 29% users manually specify interests in their profile – A list of selected terms § From a static 1120 -term taxonomy related to work § Compare implicit and explicit interests – Explicit interests models are more limited § Implicit interests cover 60. 4% explicit interests § Explicit interests cover 2. 2% implicit interests | 9/29/2020 © Copyright IBM Corporation 2010
IBM Research Infer Interests Based on Social Influence § Social influence model – Network autocorrelation model [Leenders 02] § Social influence represented as a weighted combination of neighbors’ attributes The weight is an exponential function of the social distance | 9/29/2020 © Copyright IBM Corporation 2010
IBM Research Inference Quality § Implicit interests: how close the inferred top-20 topics to the ground truth Condition Max Mean St. Deviation Using social bookmark data only 59. 4% 19. 2% 10. 7% Using file sharing data only 44. 9% 12. 7% 7. 2% Using email/IM data only 62. 1% 29. 6% 14. 1% Using all three data 100% 45. 1% 21. 7% § Explicit interests: precision and recall of inferred terms Measure Mean St. Deviation Precision 30. 1% 26. 9% Recall 61. 5% 27. 6% – Significant advantage in combining multiple sources – Large variance can affect practical application, thus need predict when to infer interests – Much better recall than precision | 9/29/2020 © Copyright IBM Corporation 2010
IBM Research Can Inference Quality be Predicted? § Hypothesis: inference quality can be predicted from social network properties – User activeness: the amount of contribution – In-degree – Out-degree – Betweenness – User management role § Use Support Vector Regression to perform prediction § Evaluate prediction – Precision/recall of the prediction (10 -fold cross validation) – Use prediction to improve inference § Only infer when we predict it’s high quality | 9/29/2020 © Copyright IBM Corporation 2010
IBM Research Quality Prediction Results § Precision/recall of prediction Implicit Interests Explicit Interests § Improve inference Measure Improved to Improvement (%) Precision 60. 5% 101% Recall 85. 7% 39. 3% Explicit Interests Implicit Interests | 9/29/2020 © Copyright IBM Corporation 2010
IBM Research Feature Comparison § “Leave-one-feature-out" comparisons of prediction results Most social influences are from 1&2 -degree neighbors You neighbors decide how well you can be inferred You neighbors’ network positions may be even more important than how active they are – Formal organizational properties § Manager neighbors are more important in inference – i. e. , more social influence (about 5% more) | 9/29/2020 © Copyright IBM Corporation 2010
IBM Research Related Work § User modeling – Use behavioral data of the Ego § [Shepitsen 08, Song 05, Stoyanovic 08, Teevan 05] – Use data of 1 -degree neighbors § Issued the same query ([Piwowarski 07, White 09]) § Collaborative filtering ([Goldberg 92]) § Social influence and correlation – Correlation and related factors in social networks § [Singla 08, Blei 03, Crandall 08, Anagnostopoulos 08, Tang 09] – Infer user profiles in online communities § [Mislove 2010] | 9/29/2020 © Copyright IBM Corporation 2010
IBM Research Conclusion § There’s large variance in the quality of inferring user interests from social neighbors § The “recall” of the inference is much better than “precision” § The inference quality can be predicted from social network properties | 9/29/2020 © Copyright IBM Corporation 2010
IBM Research Questions? | 9/29/2020 © Copyright IBM Corporation 2010
- Slides: 19