Social Search Laks V S Lakshmanan How are

How are web search queries distributed? Taken from Damon Horowitz’s talk slides.

How are web search queries distributed? Web search works well! Web search is a

Social Search General Remarks �Search in text corpora (IR). �Search in a linked environment

Social Search – The Problem & Issues �Search results for a user should be

More Issues �Factoring in transitive friends somewhat similar to Katz: longer the geodesic from

Other approaches � Google Social Search ◦ Search for “barcelona” returns results including searcher’s

The Aardvark Approach �Classic web search – roots in IR; authority centric: return most

Aardvark Modules �Crawler and Indexer. �Query Analyzer. �Ranking Function. �UI.

Index what? � User’s existing social habitat – LI, FB contacts; common groups such

Query Life Cycle Transport Layer Conversatio n Manager Routing Engine

Query Answering Model � Prob. that u_i is an expert in topic t. Prob.

Question Analysis �Semi-automated: ◦ Soft classification into topics – ◦ Filter out non-qns, inappropriate

Overall ranking �Aggregation of three kinds of scores: ◦ Topic expertise. ◦ Social proximity/match

Social. Wisdom for Search and Recommendation Ralf Schenkel et al. IEEE DE Bullet. June

Friendship types and search modes �Social – computed from explicit social graph, say using

Scoring documents for tags – digress into BM 25 �BM 25 – state of

Adapt to social search (k 1 + 1) · |U| · sfu(d, t) �su(d,

Tag expansion �Sometimes (often? ) users may use related tags: e. g. , tag

Socially aware Tag Expansion �Who tagged the documents and what is the strength of

Lessons and open challenges �Socializing search across the board is a bad idea. �Need

Lessons & Challenges (contd. ) � 2. Queries with a subjective taste (a social

Lessons & Challenges (contd. ) � 4. Queries with a mixed information need: perform

Follow-up Reading (Efficiency) �S. Amer-Yahia, M. Benedikt, P. Bohannon. Challenges in Searching Online Communities.

Follow-up Reading (Temporal Evolution, Events, Networks, . . . ) � N. Bansal, N.

Slides: 26

Download presentation

Social Search Laks V. S. Lakshmanan

How are web search queries distributed? Taken from Damon Horowitz’s talk slides.

How are web search queries distributed? Web search works well! Web search is a good start; more effort needed, possibly on top. Based on opinion of friends. Adapted from Damon Horowitz’s talk slides.

Social Search General Remarks �Search in text corpora (IR). �Search in a linked environment (authority, hub, pagerank). �What if my social context should impact search results? ◦ E. g. : users in a SN post reviews/ratings on items. ◦ “Items” = anything they want to talk about and share their opinions with their friends and implicitly recommend (against).

Social Search – The Problem & Issues �Search results for a user should be influenced by how he/his friends rated the items, in addition to quality of match as determined by IR methods and/or by pagerank-like methods. �Transitive friends’ ratings may matter too, up to some distance. �Users may just comment on an item w/o explicitly rating it.

More Issues �Factoring in transitive friends somewhat similar to Katz: longer the geodesic from u to v, the less important v’s rating is to u. �Trust may be a factor. �There is vast literature on trust computation. �May need to analyze opinions (text) and translate into strength (score) and polarity (good or bad? ).

Other approaches � Google Social Search ◦ Search for “barcelona” returns results including searcher’s friends’ blogs. ◦ Relevant users need to connect their facebook, twitter, . . . Accounts to their google profile. ◦ Of particular value when serahcing for local resources such as shows and restaurants. But this does not user-generated content for ranking. � Aardvark – part of Google labs that was shut down -- is an interesting approach to social search. � There are other companies such as sproose. com. See wikipedia for a list and check them out. (sproose seems to take reviews into account in ranking. ) [defunct now? ] � Some papers develop notions of Social. Rank, User. Rank, Folk. Rank, similar to Page. Rank (see references in Schenkel et al. 2008 [details later]). � Part I based on: Damon Horowitz and Sepandar D. Kamvar. The Anatomy of a Large-Scale Social Search Engine. WWW 2010. �

The Aardvark Approach �Classic web search – roots in IR; authority centric: return most relevant docs as answers to a search query. �Alternative paradigm: consult village wise people. Can you think of similar systems that already exist? Hint: what do you do when you encounter diffculties �Web search – keyword based. with a new computer, system, software, tool? �Social search – natural language; social intimacy/trust instead of authority. �E. g. : what’s a good bakery in the Mag mile Note: long, subjective, and contextualized queries. area in Chicago? �What’s a good handyman, who is not too expensive, is punctual, and honest? These queries are normally handled offline, by asking real people. Social search seeks to make them online.

Aardvark Modules �Crawler and Indexer. �Query Analyzer. �Ranking Function. �UI.

Index what? � User’s existing social habitat – LI, FB contacts; common groups such as school attended, employer, …; can invite additional contacts. � Topics/areas of expertise: learned from ◦ ◦ Self declaration Peer endorsement (a la LI) Activities on LI, FB, Twitter, etc Activites (asking/answering [or not] questions) on Aardvark. � Forward Index: user (id), topics of expertise sorted by strength, answer quality, response time, … � Inverted Index: for each topic, list of users sorted on expertise, plus answer quality, response time, etc.

Query Life Cycle Transport Layer Conversatio n Manager Routing Engine

Query Answering Model � Prob. that u_i is an expert in topic t. Prob. that question q is In topic t. All this is fine. But it’s important to Engage a large #high quality question askers and answerers to make and keep Prob. That u_i can successfully answer a question from u_j. The system useful. Usually based on strength of social connections/trust etc. Prob. that u_i can successfully answer question q from u_j.

Indexing Users �

Question Analysis �Semi-automated: ◦ Soft classification into topics – ◦ Filter out non-qns, inappropriate and trivial qns. ◦ Keyword. Match. Topic. Mapper map keywords/terms in question to topics in user profile. ◦ Taxonomy. Topic. Mapper places question on a taxonomy covering popular topics. ◦ Location. Matching. �Human judges assign scores to topics (evaluation).

Overall ranking �Aggregation of three kinds of scores: ◦ Topic expertise. ◦ Social proximity/match between asker and answerer. ◦ Availability of answerer (can be learned from online activity patterns, load, etc. ) �Answerers contacted in priority order. �Variety of devices supported. �See paper for more details and for experimental results.

Social. Wisdom for Search and Recommendation Ralf Schenkel et al. IEEE DE Bullet. June 2008. �Expand scope of Rec. Sys by storing (in a relational DB) other info. : Users(username, location, gender, . . . ) Friendships(user 1, user 2, ftype, fstrength) Documents(docid, description, . . . ) Linkage(doc 1, doc 2, ltype, lweight) Tagging(user, doc, tag, tweight) Ontology(tag 1, tag 2, otype, oweight) Rating(user, doc, assessment) Just modeling/scoring aspects; scalability ignored for now.

Friendship types and search modes �Social – computed from explicit social graph, say using inverse distance. Could be based on others like Katz. �Spiritual – derived based on overlap in activities (rating, reviews, tagging, . . . ). �Global – all users given equal weight = 1/|U|. �All measures normalized so the weights on all o/g edges from a user sum to 1. �Combos possible: F(u, u’) = a. Fso(u, u’) + b. Fsp(u, u’) + c. Fgl(u, u’), with a+b+c = 1.

Scoring documents for tags – digress into BM 25 �BM 25 – state of the art IR model. idf(ti) (k 1+1)tf(D, ti) �score(D, ti) = ------------- tf(D, ti) + k 1(1 -b+b. len(D)/avgdl) �k 1, b tunable parameters. � #docs – n(ti)+0. 5 �idf(D, ti) = log ---------- � n(ti)+0. 5 �tf = term frequency, idf = inverse doc frequency. ; avgdl = avg doc length, n(ti) = #docs containing ti.

Adapt to social search (k 1 + 1) · |U| · sfu(d, t) �su(d, t) = -------------- · idf(t) k 1 + |U| · sfu(d, t) |U|=#users. |D| − df(t) + 0. 5 �idf(t) = log ----------- df(t) + 0. 5 |D|=#docs, df(t) = #docs tagged t. �sfu(d, t) = ∑vЄUFu(v) tfv(D, t). �BTW, when we say docs, think items!

Tag expansion �Sometimes (often? ) users may use related tags: e. g. , tag an automobile as “Ferrari” and as “car”. �tsim(t, t’) = P[t|t’] = df(t&t’)/df(t’). //error in the paper. // �Then sfu*(d, t) = maxt‘ЄT tsim(t, t’). sfu(d, t‘). Plug in sfu*(d, t) in place of sfu(d, t) and we are all set.

Socially aware Tag Expansion �Who tagged the documents and what is the strength of their connection to u? � tsimu(t, t’) = ∑vЄUFu(v). dfv(t&t’)/dfv(t’). �Score for a query: �s*u(d, t 1, . . . , tn) = ∑ti s*u(d, ti). �Experiments – see paper: librarything. com, mixed results. �Measured improvement in precision@top-10 and NDCG@top-10.

Lessons and open challenges �Socializing search across the board is a bad idea. �Need to understand which kind of queries can benefit from what kind of settings (a, b, c values). Examples below. 1. Queries w/ global information need: perform best when a= b= 0; e. g. , “Houdini”, “search engines”, “English grammar”; fairly precise queries; reasonably clear what are quality results.

Lessons & Challenges (contd. ) � 2. Queries with a subjective taste (a social aspect): perform best when a≈1; e. g. , “wizard”; produces a large number of results but user may like only particular types of novels such as “Lord of the Rings”; the tag “wizard” may be globally infrequent but frequent among user’s friends. � 3. Queries with a spiritual information need: perform best when b ≈ 1; e. g. , “Asia travel guide”; very general, need to make full use of users similar (in taste) to searcher. (Think recommendations. )

Lessons & Challenges (contd. ) � 4. Queries with a mixed information need: perform best when a≈b≈0. 5; e. g. , “mystery magic”. �Challenges: The above is an ad hoc classification. Need more thorough studies and deeper insights. �Can the system “learn” the correct setting (a, b, c values) for a user or for a group? �The usual scalability challenges: see following references. �Project opportunity here.

Follow-up Reading (Efficiency) �S. Amer-Yahia, M. Benedikt, P. Bohannon. Challenges in Searching Online Communities. IEEE Data Eng. Bull. 30(2), 2007. �R. Schenkel, T. Crecelius, M. Kacimi, S. Michel, T. Neumann, J. X. Parreira, G. Weikum. Efficient Top-k Querying over Social-Tagging Networks. SIGIR 2008. �M. V. Vieira, B. M. Fonseca, R. Damazio, P. B. Golgher, D. de Castro Reis, B. Ribeiro-Neto. Efficient Search Ranking in Social Networks. CIKM 2007.

Follow-up Reading (Temporal Evolution, Events, Networks, . . . ) � N. Bansal, N. Koudas. Searching the Blogosphere. Web. DB 2007. � M. Dubinko, R. Kumar, J. Magnani, J. Novak, P. Raghavan, A. Tomkins. Visualizing Tags over Time. ACM Transactions on the Web, 1(2), 2007. � S Bao, G Xue, X Wu, Y Yu, B Fei. Optimizing web search using social annotations. WWW 2007. � Anish Das Sarma, Alpa Jain, and Cong Yu. Dynamic Relationship and Event Discovery. In WSDM, Hong Kong, China 2011. � Sihem Amer-Yahia, Michael Benedikt, Lakshmanan, Julia Stoyanovich. Efficient Networkaware Search in Collaborative Tagging Sites VLDB 2008, 2008 We will revisit social search later in your talks.