Query Suggestion Query Suggestion n A variety of

















- Slides: 17

Query Suggestion

Query Suggestion n A variety of automatic or semi-automatic query suggestion techniques have been developed Ø Ø n Goal is to improve effectiveness by matching related/similar terms Semi-automatic techniques require user interaction to select best suggested terms Query expansion is a related technique Ø Alternative queries, usually offer more terms 2

Query Suggestion n Approaches usually based on an analysis of term cooccurrence Ø Ø n Either in the entire document collection, a large collection of queries, or the top-ranked documents in a result list Query-based stemming also a suggestion technique Automatic suggestion based on general thesaurus not effective Ø Does not take context into account, e. g. , “aquarium” is a good suggestion for “tank” in the query “tropical fish tank”, but not for “armor for tanks” 3

Term Association Measures n Dice’s Coefficient where rank = stands for rank equivalent n Mutual Information Measure (MIM) Measures the extent to which words cooccurrence independently where N is the number of documents in a collection P(a) = na/N, P(b) = nb/N, P(a, b) = nab/N 4

Term Association Measures n n Mutual Information measure (MIM) favors low frequency terms Expected Mutual Information Measure (EMIM) addresses the problem of MIM by weighting MIM using P(a, b) Ø Actually only 1 part of EMIM focused on word occurrence Ø EMIM, however, favors high frequency terms 5

Term Association Measures n Pearson’s Chi-squared (χ2) measure Ø Compares the number of co-occurrences of two words with the expected number of co-occurrences if the two words were independent Ø Normalizes this comparison by the expected number Ø Also limited form focused on word co-occurrence Expected number of cooccurrence if the words occur independently Favors lowfrequency terms 6

Association Measure Summary 7

Association Measure Example Identical ranking & favor lowfrequency words More general than MIM & X 2 Most strongly associated words for “tropical” in a collection of TREC news stories. Co-occurrence counts are measured at the document level. 8

Association Measure Example Similar Topranked words in MIM & X 2 Most strongly associated words for “fish”, a high frequent term, in a collection of TREC news stories. 9

Association Measure Example Still favor low-frequency terms Most stable & reliable regardless of the window sizes Most strongly associated words for “fish” in a collection of TREC news stories. Co-occurrence counts are measured in windows of 5 words. 10

Association Measures n n Associated words are of little use for expanding the query “tropical fish” Expansion based on whole query takes context into account Ø e. g. , using Dice with term “tropical fish” gives the following highly associated words: goldfish, reptile, aquarium, coral, frog, exotic, stripe, regent, pet, wet n Impractical for all possible queries, other approaches used to achieve this effect 11

Other Approaches n Pseudo-relevance feedback Ø n Expansion terms based on top retrieved docs for initial query Context vectors Ø Represent words by the words that co-occur with them • e. g. , top 35 most strongly associated words for “aquarium” (using Dice’s coefficient): Ø n Rank words for a query by ranking context vectors Challenges (computational & accuracy): due to huge size 12 & variability in quality of the collections

Other Approaches n Query logs Ø Best source of information about queries & related terms • short pieces of text & click data Ø e. g. , most frequent words in queries containing “tropical fish” from MSN log: stores, pictures, live, sale, types, clipart, blue, freshwater, aquarium, supplies Ø Query suggestion based on finding similar queries • group based on click data 13

Query Expansion n Search engines suggest expanded/alternative queries in response to a query Q Ø Using some form of thesaurus to perform global analysis • For each term t in Q, Q is expanded with synonyms and related words of t from thesaurus 14

Query Expansion n Methods for building a thesaurus for query expansion 1. Use of a controlled vocabulary maintained by human editors, such as the Library of Congress subject headings (LCSH), e. g. , • The LCSH of “American Revolutionary War” is United States – History -- Revolution, 1775 -1783 2. 3. An automatically derived thesaurus, constructed using word co-occurrence statistics over a collection of docs Query reformulations based on query log mining by exploring the manual query reformulations of other users to make suggestions to a user Thesaurus-based query expansion does not require any user input to increase recall 15

Query Expansion n Automatic thesaurus generation using word co-occurrence Ø A simple approach is based on term-term similarities • Start with a term-document matrix A, where each cell At, d is a weighted count of wt, d for term t & document d • Calculate C = AAT in which Cu, v is a similarity score between terms u and v, the larger the number, the better • An example of a derived thesaurus with good/bad suggestions 16

Query Expansion n The quality of term association is typically a problem in an automatically generated thesaurus Ø Term ambiguity easily introduces irrelevant statistically correlated terms, such as “Apple” can be expanded to “Apple red fruit computer” • Suffer from false positives (FP) and false negatives (FN) Ø Ø High cost to manually produce and update a thesaurus Query expansion often increases recall, but may also significantly decease precision , especially when the query contains ambiguous terms, e. g. , interest rate fascinate evaluate is unlikely to be useful 17
Dns recursive iterative
Query tree and query graph
Query tree and query graph
He said to me please lend me your pen
Claim of
Offering and suggestion
Scopus title suggestion
Suggestions modals
Pour toute suggestion
Giving advice dialogue
Identify the expressions of asking and giving opinions
What is the purpose of giving suggestion
Suggestions conversation
Syllable rules
Suggestopedia pros and cons
Language
Variety and variability
Exfac