Chapter 10 Social Search Social Search n n

Social Search n n “Social search describes search acts that make use of social

Social vs. Standard Search n Key differences Ø Ø Users interact with the system

Social Search Activities n Social search activities, as defined in [Evans 08] Stage where

Social Search Activities Users perform a series of actions to identify content from a

Social Search Activities Schematize process where raw evidence is organized/represented in some schematic way

Web 2. 0 n Social search includes, but is not limited to, the socalled

Social Search Topics n Online user-interactive data, which provide a new and interesting search

User Tags and Manual Indexing n Then: Library card catalogs Ø Indexing terms chosen

Social Tagging n According to [Guan 10] Ø Ø Social tagging services allow users

Social Tagging Ø Rating vs. tagging data [Guan 10] Ø Ø Tagging data does

Types of User Tags n Content-based Ø n Context-based Ø n Tags describe the

Searching Tags n Searching collaboratively tagged items, i. e. , user tags, is challenging

Tag Expansion n Can overcome vocabulary mismatch problem by expanding tag representation with external

Tag Expansion Using Search Results The Krib (Aquaria and Tropical Fish) This site contains

Searching Tags n n Even with tag expansion, searching tags is challenging Tags are

Inferring Missing Tags n How can we automatically tag items with few or no

Methods for Inferring Tags n n n TF-IDF (based on textual representation of the

Browsing and Tag Clouds n n n Search is useful for finding items of

Tag Clouds n As defined in [Schrammel 09], tag clouds are Ø Ø Visual

Sample Tag Cloud art australia baby barcelona beach berlin birthday blackandwhite blue california cameraphone

Searching with Communities n What is an online community? Ø Ø Ø n Besides

Online Communities n According to [Seo 09] Ø Ø Online communities are valuable information

Online Communities § Gathering of people, in online space, where they can come, communicate

Finding Communities n n How to design general-purpose algorithms for finding every possible type

HITS n n n Hyperlink-induced Topic Search (HITS) algorithm can be used to find

HITS n Form community (C) Ø Ø Ø Apply the entity interaction graph to

Finding Communities n Clustering Ø Ø n n Community finding is an inherently unsupervised

Graph Representation 2 1 4 3 5 6 Node: 1 0 0 Vector: 0

Community Based Question Answering n Some complex information needs can’t be answered by traditional

Community Based Question Answering n Pros Ø Ø Ø n Users can find answers

Community Based Question Answering n Yahoo! Answers, a community-driven question-andanswer site launched by Yahoo!

Question Answering Models n n How can we effectively search an archive of question/

Question Answering Models n Translation-based language model (for finding related questions and answers): translate

Question Answering Models n Enhanced translation model, which extends the translation-based language model on

Computing Translation Probabilities n Translation probabilities are learned from a parallel corpus n Most

Collaborative Searching n n n Traditional search assumes single searcher Collaborative search involves a

Collaborative Search n n n Two types of collaborative search settings depending on where

Collaborative Search Scenarios Co-located Collaborative Searching Example. Co. Search Remote Collaborative Searching Example. Search.

Collaborative Search n Involves a group of users who share a common goal searching

Document Filtering n Ad hoc retrieval Ø Ø n Document collections and information needs

Profiles n n Represents long-term information needs and personalized the search experience Can be

Adaptive Filtering n In adaptive filtering, profiles are dynamic n How can profiles change?

Adaptive Filtering Models n Rocchio Ø n Profiles treated as vectors Relevance-based language models

Fast Filtering with Millions of Profiles n n Real filtering systems Ø May have

Evaluation of Filtering Systems n Definition of “good” depends on the purpose of the

Collaborative Filtering n n n Static and adaptive filtering are not social tasks; profiles

Collaborative Filtering n According to [Ma 09], there are two widely-used types of methods

Collaborative Filtering n Example. Predicts the missing values in the user-item matrix [Ma 09]

Recommender Systems n n n Recommender systems use collaborative filtering algorithms to recommend items

Recommender Systems Users with similar profiles are close to each other Preference of an

Recommender System Algorithms n Input Ø Ø n Typically represented as a user-item matrix

Slides: 58

Download presentation

Chapter 10 Social Search

Social Search n n “Social search describes search acts that make use of social interactions with others. These interactions may be explicit or implicit, co-located or remote, synchronous or asynchronous” Social search Ø Ø Search within a social environment Communities of users actively participating in the search process Ø Goes beyond classical search tasks Ø Facilitates the “information seeking” process [Evans 08] Evans et al. Towards a Model of Understanding Social Search. In Proc. of Conf. on Computer Supported Cooperative Work. 2008. 2

Social vs. Standard Search n Key differences Ø Ø Users interact with the system Users interact with one another in a open/social environment implicitly/explicitly such as • Visiting social media sites, e. g. , You. Tube • Browsing through social networking sites, e. g. , Facebook 3

Social Search Activities n Social search activities, as defined in [Evans 08] Stage where user motives and information needs are defined Stage where user search requirements are refined 4

Social Search Activities Users perform a series of actions to identify content from a particular location Exploratory process Users search for information within a specific patch followed by extracting information from source files Users locate a source where they can perform a transaction or web-mediated activity Users identify preliminary “evidence files” from which they further modify their search schema and query 5

Social Search Activities Schematize process where raw evidence is organized/represented in some schematic way Distribute the end product to others, either face-to-face, by printing out docs, or bookmark websites for re-accessing in the future 6

Web 2. 0 n Social search includes, but is not limited to, the socalled social media site Ø n n Collectively referred to as “Web 2. 0” as opposed to the classical notion of the Web (“Web 1. 0”) Social media sites Ø User generated content Ø Users can tag their own and other’s content Ø Users can share favorites, tags, etc. , with others Ø Provide unique data resources for search engines Example. Ø You. Tube, My. Space, Facebook, Linked. In, Digg, Twitter, Flickr, Del. icio. us, and Cite. ULike 7

Social Search Topics n Online user-interactive data, which provide a new and interesting search experience Ø Ø Ø User tags: users assign tags to data items, a manual indexing approach Searching within communities: virtual groups of online users, who share common interests, interact socially, such as blogs and QA systems Recommender systems: individual users are represented by their profiles (fixed queries – long-term info. need) such as CNN Alert Service, Amazon. com, etc. Peer-to-peer Network: querying a community of “nodes” (individual/organization/search engine) for an info. need Metasearch: a special case of P 2 P – all the nodes are SEs 8

User Tags and Manual Indexing n Then: Library card catalogs Ø Indexing terms chosen with search in mind Ø Experts generate indexing terms manually Ø Ø n Terms are very high quality based on the Library of Congress Subject Headings standardized by the US Library of Congress Terms chosen from controlled/fixed vocabulary and subject guides (a drawback) Now: Social media tagging Ø Social media sites allow users to generate own tags manually (+) Ø Tags not always chosen with search in mind (-) Ø Tags can be noisy or even incorrect and without quality control (-) Ø Tags chosen from folksonomies, user-generated taxonomies 9

Social Tagging n According to [Guan 10] Ø Ø Social tagging services allow users to annotate online resources with freely chosen keywords Tags are collectively contributed by users and represent their comprehension of resources. Tags provide meaningful descriptors of resources and implicitly reflect users’ interests Tagging services provide keywordbased search, which returns resources annotated by the given tags 10

Social Tagging Ø Rating vs. tagging data [Guan 10] Ø Ø Tagging data does not contain users’ explicit preference information on resources Tagging data involves three types of objects, i. e. , user, tag, and resource, whereas rating data only contains users and resources [Guan 10] Guan et al. Document Recommendation in Social Tagging Services. In Proc. of Intl. Conf’ on World Wide Web. 2010 11

Types of User Tags n Content-based Ø n Context-based Ø n Tags describe the attributes of an item, e. g. , Nikon (type of camera), black and white (type of movie), etc. Subjective-based Ø n Tags describe the context of an item, e. g. , NYC, empire bldg Attribute-based Ø n Tags describe the content of an item, e. g. , car, woman, sky Tags subjectively describe an item, e. g. , pretty, amazing, etc. Organizational-based Ø Tags that organize items, e. g. , to do, readme, my pictures … 12

Searching Tags n Searching collaboratively tagged items, i. e. , user tags, is challenging Ø Ø n Most items have only a few tags, i. e. , complex items are sparely represented, e. g. , “aquariums” “goldfish”, the vocabulary mismatch problem Tags are very short Boolean (AND/OR), probabilistic, vector space, and language modeling will fail if use naïvely 13

Tag Expansion n Can overcome vocabulary mismatch problem by expanding tag representation with external knowledge Possible external sources Ø Thesaurus Ø Web search results Ø Query logs After tags have been expanded, can use standard retrieval models 14

Tag Expansion Using Search Results The Krib (Aquaria and Tropical Fish) This site contains information about tropical fish aquariums, including archived usenet postings and e-mail discussions, along with new. . . … Keeping Tropical Fish and Goldfish in Aquariums, Fish Bowls, and. . . Keeping Tropical Fish and Goldfish in Aquariums, Fish Bowls, and Ponds at Aquarium. Fish. net. bowls Pseudo-relevance feedback over related terms goldfish tropical A retrieved snippet aquariums Example. Web search results enhance a tag representation, “tropical fish”, a query Age of Aquariums - Tropical Fish Huge educational aquarium site for tropical fish hobbyists, promoting responsible fish keeping internationally since 1997. P(w | “tropical fish”) 15

Searching Tags n n Even with tag expansion, searching tags is challenging Tags are inherently noisy (off topic, inappropriate) and incorrect (misspelled, spam) Many items may not even be tagged, which become virtually invisible to any search engine Typically easier to find popular items with many tags than less popular items with few/no tags 16

Inferring Missing Tags n How can we automatically tag items with few or no tags? n Uses of inferred tags to Ø Improved tag search Ø Automatic tag suggestion 17

Methods for Inferring Tags n n n TF-IDF (based on textual representation of the item) Ø Suggest tags that have a high TF-IDF weight in the item Ø Only works for textual items Classification (determines the appropriateness of a tag) Ø Train binary classifier for each tag, e. g. , using SVM Ø Performs well for popular tags, but not as well for rare tags Maximal marginal relevance Ø Large, if t is very relevant to i Finds relevant tags to the item and novel with respect to others where Simitem(t, i) is the similarity between tag t and item i Simtag(ti, t) is the similarity between tags ti and t (= 1 or 0), a tunable parameter 18

Browsing and Tag Clouds n n n Search is useful for finding items of interest Browsing is more useful for exploring collections of tagged items Various ways to visualize collections of tags Ø Tag lists (for a website or particular group/category of items) Ø Tag clouds (show the popularity of tags based on sizes) Ø (Tags are) Alphabetically order and/or weighted Ø Grouped by category Ø Formatted/sorted according to popularity 19

Tag Clouds n As defined in [Schrammel 09], tag clouds are Ø Ø Visual displays of set of words (tags) in which attributes of the text such as size, color, font weight, or intensity are used to represent relevant properties, e. g. , frequency of documents linked to the tag A good visualization technique to communicate an “overall picture” [Scharammel 09] Schrammel et al. Semantically Structured Tag Clouds: an Empirical Evaluation of Clustered Presentation Approaches. In Proc. of Intl’ Conf’ on Human Factors in Computing Systems. 2009. 20

Sample Tag Cloud art australia baby barcelona beach berlin birthday blackandwhite blue california cameraphone canada canon car cat chicago china christmas church city clouds color concert day dog england europe family festival film florida flowers food france friends fun garden germany girl graffiti green halloween hawaii holiday home house india ireland italy japan july kids lake landscape light live london macro me mexico music nature newyork night animals architecture autumn nikon nyc ocean paris sanfrancisco scotland summer sunset trees trip uk band park party people portrait red river rock seattle show sky snow spain spring street taiwan texas thailand tokyo toronto travel usa vacation washington water wedding 21

Searching with Communities n What is an online community? Ø Ø Ø n Besides tagging, community users also post to newsgroups, blogs, and other forums To improve the overall user experiments, web search engines should automatically find the communities of a user Example. Ø n Groups of entities (i. e. , users, organizations, websites) that interact in an online environment to share common goals, interests, or traits Baseball fan community, digital photography community, etc. Not all communities are made up of humans! Ø Web communities are collections of web pages that are all about a common topic 22

Online Communities n According to [Seo 09] Ø Ø Online communities are valuable information sources where knowledge is accumulated by interactions between people Online community pages have many unique textual or structural features, e. g. , • A forum has several sub-forums covering high-level topic categories • Each sub-forum has many threads • A thread is a more focused topic-centric discussion unit and is composed of posts created by community members [Seo 09] Seo et al. Online Community Search Using Thread Structure. In Proc. of ACM Conf. on Information and Knowledge Management. 2009. 23

Online Communities § Gathering of people, in online space, where they can come, communicate and know each other over time [Chen 08] § A social aggregation that emerges when enough people carry on public discussions over time, with sufficient human feeling, to form webs of personal relationships in cyberspace [Rheingold 00] § Online communities [Chen 08] Ø Have open membership Ø A user can reach the remaining ones in the community easily Ø Ø Shared interests and activities are the major reasons to attract users to join the community Tagging information used to define the interests of the users [Chen 08] Chen et al. Finding Core Members in Virtual Communities. WWW 2008. [Rheingold 00] H. Rheingold. The Virtual Community: Homesteading on the 24 Electronic Frontier. MIT Press. 2000

Finding Communities n n How to design general-purpose algorithms for finding every possible type of on-line community? What are the characteristics of a community? Ø Ø n Entities (users) within a community are similar to each other Members of a community are likely to interact more with one another of the community than those outside of the community Can represent interactions between a set of entities as a graph Ø Vertices (V) are entities Ø Edges (E), directed or undirected, denote interactions of entities • Undirected edges represent symmetric relationships • Directed edges represent non-symmetric or causal relationships 25

HITS n n n Hyperlink-induced Topic Search (HITS) algorithm can be used to find communities Ø A link analysis algorithm, like Page. Rank Ø Each entity has a hub and authority score Based on a circular set of assumptions Ø Good hubs point to good authorities Ø Good authorities are pointed to by good hubs Iterative algorithm: Sum of the hub scores of the entities pointing at p Sum of the authority scores pointed at by p 26

HITS n Form community (C) Ø Ø Ø Apply the entity interaction graph to find communities Identify a subset of the entities (V), called candidate entities, be members of C (based on common interest) Entities with large authority scores are the “core” or “authoritative” members of C • to be a strong authority, an entity must have many incoming edges, all with relatively moderate hub scores, or • have very few incoming links that have very large hub scores Ø Vertices not connected with others have hub and authority scores of 0 27

Finding Communities n Clustering Ø Ø n n Community finding is an inherently unsupervised learning problem Agglomerative or K-means clustering approaches can be applied to entity interaction graph to find communities Use the vector representation to capture the connectivity of various entities Compute the authority values based on the Euclidean distance Evaluating community finding algorithms is hard Can use communities in various ways to improve web search, browsing, expert finding, recommendation, etc. 28

Graph Representation 2 1 4 3 5 6 Node: 1 0 0 Vector: 0 0 0 2 0 0 0 1 0 3 1 0 0 0 1 1 0 4 0 0 0 1 0 7 5 1 0 0 0 6 0 0 0 0 7 0 0 0 0 1 2 3 4 5 6 7 29

Community Based Question Answering n Some complex information needs can’t be answered by traditional search engines Ø No single webpage may exist that satisfies the information needs Ø Information may come from multiple sources Ø n Human (non-)experts in a wide range of topics form a community- based question answering (CQA) group, e. g. , Yahoo! Answers Community based question answering tries to overcome these limitations Ø Searcher enters questions Ø Community members answer questions 30

Example Questions 31

Community Based Question Answering n Pros Ø Ø Ø n Users can find answers to complex or obscure questions with diverse opinions about a topic Answers are from humans, not algorithms, that can be interacted with who share common interests/problems Can searchive of previous questions/answers, e. g. , Yahoo! Answers Cons Ø Some questions never get answered Ø Often takes time (possibly days) to get a response Ø Answers may be wrong, spam, or misleading 32

Community Based Question Answering n Yahoo! Answers, a community-driven question-andanswer site launched by Yahoo! on July 5, 2005 33

Question Answering Models n n How can we effectively search an archive of question/ answer pairs databases? Can be treated as a translation problem Ø Ø n Translate a question into a related/similar question which likely have relevant answers Translate a question into an answer: less desirable The vocabulary mismatch problem Ø Traditional IR models likely miss many relevant questions Ø Many different ways to ask the same question Ø Stopword removal and stemming do not help Ø Solution: consider related concepts (i. e. , words)–the probability of replacing one word by another 34

Question Answering Models n Translation-based language model (for finding related questions and answers): translate w (in Q) from t (in A) where Q is a question A is a related question in the archive V is the vocabulary P(w | t) are the translation probability P(t | A) is the (smoothed) probability of generating t given A Ø Ø Anticipated problem: a good (independent) term-to-term translation might not yield a good overall translation Potential solution: matches of the original question terms are given more weight than matches of translated terms 35

Question Answering Models n Enhanced translation model, which extends the translation-based language model on ranking Q: where 0. . 1 controls the influence of the translation probability is a smoothing parameter |A| is the number of words in question A Cw is count of w in the entire collection C, and |C| is the total number of word occurrence in C Ø Ø when 1, the model becomes more similar to the translationbased language model when 0, the translated question is equivalent to the original question 36

Computing Translation Probabilities n Translation probabilities are learned from a parallel corpus n Most often used for learning inter-language probabilities n Can be used for intra-language probabilities Ø Ø n Treat question-answer pairs as parallel corpus Translation probabilities are estimated from archived pairs (Q 1, A 1), (Q 2, A 2), …, (QN, AN) Drawbacks Ø Ø Computationally expensive: sum over the entire vocabulary, which can be very large Solution: considering only a small number (e. g. , 5) of (most likely) translations per question term 37

Sample Question/Answer Translations 38

Collaborative Searching n n n Traditional search assumes single searcher Collaborative search involves a group of users, with a common goal--searching together in a collaborative setting Example scenarios Ø Ø Ø Students doing research for a history report Family members searching for information on how to care for an aging relative Team member working to gather information and requirements for an industrial project 39

Collaborative Search n n n Two types of collaborative search settings depending on where participants are physically located Co-located Ø Participants in same location Ø Co-Search system (Amershi & Morris, 2008) Remote collaborative Ø Participants in different locations Ø Search-Together system (Morris & Horvitz, 2007) 40

Collaborative Search Scenarios Co-located Collaborative Searching Example. Co. Search Remote Collaborative Searching Example. Search. Together 41

Collaborative Search n Involves a group of users who share a common goal searching together in a collaborative setting Ø n Members contribute, gather, and have a better understanding on the collected information Challenges Ø How do users interact with system? Ø How do users interact with each other? Ø How is data shared? Ø What data persists across sessions? n Very few commercial collaborative search systems n Likely to see more of this type of system in the future 42

Document Filtering n Ad hoc retrieval Ø Ø n Document collections and information needs change with time Results (static) returned when query is entered Document filtering Ø Ø Document collections change with time, but information needs are static (long-term) Long term information needs represented as a profile Documents entering system that match the profile are delivered to the user via a push mechanism Must be efficient and effective (minimizes FPs and FNs) 43

Profiles n n Represents long-term information needs and personalized the search experience Can be represented in different ways Ø Boolean or keyword query Ø Sets of relevant and non-relevant documents Ø Social tags and named entities Ø Relational constraints • “Published before 1990” Soft filters Hard filters • “Price in the $10 - $25 range” n n Actual representation usually depends on the underlying filtering model Static (filtering) or updated over time (adaptive filtering) 44

Document Filtering Scenarios Profile 1. 1 Profile 2 Profile 3 Profile 2. 1 Profile 3 t=2 t=3 t=5 t=8 Document Stream Static Filtering Adaptive Filtering Easier to process, less robust More robust, requires frequent updates 45

Static Filtering n n n Given a fixed profile, how can we determine if an incoming document should be delivered? Treat as an IR problem Ø Boolean Ø Vector space Ø Language modeling Require predefined threshold value Treat as supervised learning problem Ø Naïve Bayes Ø Support vector machines 46

Static Filtering with Language Models n n Assume profile consists of K relevant documents (Ti), each with weight αi Probability of a word w given the profile P is i is the weight (important) associated with Ti fw, Ti is the frequency of occurrence of word w in Ti is a smoothing parameter Cw is count of w in the entire collection C, and |C| is the total number of word occurrence in C 47

Adaptive Filtering n In adaptive filtering, profiles are dynamic n How can profiles change? Ø Ø Ø User can explicitly update the profile User can provide (relevance) feedback about the documents delivered to the profile Implicit user behavior can be captured and used to update the profile 48

Adaptive Filtering Models n Rocchio Ø n Profiles treated as vectors Relevance-based language models Ø Profiles treated as language models 49

Fast Filtering with Millions of Profiles n n Real filtering systems Ø May have thousands or even millions of profiles Ø Many new documents will enter the system daily How to efficiently filter in such a system? Ø Most profiles are represented as text or a set of features Ø Build an inverted index for the profiles Ø Distill incoming documents as “queries” and run against index 50

Evaluation of Filtering Systems n Definition of “good” depends on the purpose of the underlying filtering system Ø Ø n Do not product ranking of documents for each profile Evaluation measures, such as Precision@n and MAP, are irrelevant; precision, recall, and F-measure are computable Generic filtering evaluation measure: = 2, = 0, = -1, and = 0 are widely used 51

Summary of Filtering Models 52

Collaborative Filtering n n n Static and adaptive filtering are not social tasks; profiles (and their users) are assumed to be independent of each other Similar users are likely to have similar preferences or profiles Collaborative filtering exploits relationships between profile (users) to improve how items (documents) are matched to users (profiles) Ø Ø If A is similar to B and A judged a document D is relevant, then it is likely that D is also relevant to B Often used as a component of recommender system 53

Collaborative Filtering n According to [Ma 09], there are two widely-used types of methods for collaborative filtering Ø Neighborhood-based methods • Include user-based approaches, which predict the ratings of active users based on the computed information of items similar to those chosen by the active user. • Suffer from data sparsity and scalability problems Ø Model-based methods use the observed user-item ratings to train a compact model that explains the given data so that ratings can be predicted [Ma 09] Ma et al. Semi-nonnegative Matrix Factorization with Global Statistical Consistency for Collaborative Filtering. In Proc. of ACM Conf. on Information and Knowledge Management. 2009. 54

Collaborative Filtering n Example. Predicts the missing values in the user-item matrix [Ma 09] User-item Matrix Predicted Useritem Matrix 55

Recommender Systems n n n Recommender systems use collaborative filtering algorithms to recommend items that a user may be interested in, e. g. , Amazon. com, Net. Flix Unlike static/adaptive filtering algorithms, recommender systems provide ratings for items (e. g. , 0. . 1) Recommender systems Ø Suggest items (i. e. , movies, news) that are likely to interest to users, whereas Ø Collaborative filtering refers to the technique for the task of predicting users’ preferences based on taste information from other users [Ma 09] 56

Recommender Systems Users with similar profiles are close to each other Preference of an item I 1 2 1 ? 5 1 4 3 ? Preference on I Unknown ? 5 5 57

Recommender System Algorithms n Input Ø Ø n Typically represented as a user-item matrix Output Ø Ø n <user, item, rating> tuples for items that the user has explicitly rated <user, item, rating> tuples for items that the user has not rated Can be thought of as filling in the missing entries of the user-item matrix Most algorithms infer missing ratings based on the ratings of similar users 58