The Wild Thing Goes Mobile And Local Kenneth

  • Slides: 23
Download presentation
The Wild Thing Goes Mobile And Local Kenneth Church and Bo Thiesson Text Mining,

The Wild Thing Goes Mobile And Local Kenneth Church and Bo Thiesson Text Mining, Search and Navigation (TMSN) Microsoft Corporation

Wild Thing Goes Mobile Better with Wild Cards! Standard Word Wheeling (T 9) Find

Wild Thing Goes Mobile Better with Wild Cards! Standard Word Wheeling (T 9) Find k-best regex matches, subject to language model

Search Given An input pattern (regex), and A language model (LM) A list of

Search Given An input pattern (regex), and A language model (LM) A list of queries and Their popularities in the MSN logs Find the k-best (most popular) matches Conceptually grep pattern LM | sort –nr | head Heuristic speed-ups

Wild Thing > Word Wheeling For surnames, filenames, URLs Two Implicit Wild Cards Regex

Wild Thing > Word Wheeling For surnames, filenames, URLs Two Implicit Wild Cards Regex More general than prefix matching /C. * OH. */ >> /C. */ Challenge: Will users enter Wild Cards? Implicit Wild Cards Added after each “word” Initials K F C N Y C

Phone Mode And Local Phone Mode Regex notation 7#6 /[PQRS]. * [MNO]. */ Local

Phone Mode And Local Phone Mode Regex notation 7#6 /[PQRS]. * [MNO]. */ Local Different Language Model Pr(query) Pr(query | location) Local queries are different Local: Restaurants (Pizza) Non-local Web Services (e-mail, shopping) Entertainment (adult)

Goal: All Forms Go Wild But with different language models for different contexts

Goal: All Forms Go Wild But with different language models for different contexts

Goal: All Forms Go Wild But with different language models for different contexts

Goal: All Forms Go Wild But with different language models for different contexts

Demos here Condoleezza Rice Arnold Schwarzenegger Hot-mail programs

Demos here Condoleezza Rice Arnold Schwarzenegger Hot-mail programs

Wild Thing + Virtual Earth Better together here Going Local

Wild Thing + Virtual Earth Better together here Going Local

Different Expansions In Different Locations BC British Columbia Boeing Company Baptist Church Bible College

Different Expansions In Different Locations BC British Columbia Boeing Company Baptist Church Bible College * beach Waikiki Narragansett Pebble Beach Old Orchard F Detroit New London * high * school * univ * hospital * airport * river One Letter Queries

Conclusions: Why Go Local? That’s where the money is All politics is local Ditto

Conclusions: Why Go Local? That’s where the money is All politics is local Ditto for classified ads It is nice to be able to search the world But I often want stuff near me It is nice to be able to drive my car anywhere But most accidents are not far from home Geo-tagging URLs and Queries Method 1: Parse docs (hard) Method 2: Logs (easy)

Wild Thing Goes Local Wild Thing Find the k-best matches Non-local: k-best ≡ Pr(query)

Wild Thing Goes Local Wild Thing Find the k-best matches Non-local: k-best ≡ Pr(query) Local: k-best ≡ Pr(query|location) Probabilities based on query logs Non-local case Conceptually, search list of queries in freq order Stop after finding k matches Local case Heuristic Speed-ups Ditto, but store a different list for each location Local queries are different from non-local queries Lots of requests for pizza near x Lots of requests for Britney Spears But these are not local searches Apparently, not so many people want her nearby? ? ?

Smoothing Computational and statistical motivations Can’t store/estimate Pr(query | location) For all queries everywhere

Smoothing Computational and statistical motivations Can’t store/estimate Pr(query | location) For all queries everywhere Locations defined by a kd-tree Smoothing Rule: Counts Parent Unless significantly larger than sibling’s counts One parameter: p (significance level) Split by latitude After smoothing: Most counts 0 Leaf inherits counts from ancestors 2 2/2 4/2 2 Split by long 8/4 3 1 29 30 1

Search Speed-Ups grep pattern LM | sort –nr | head Heuristic speed-up Generate candidates

Search Speed-Ups grep pattern LM | sort –nr | head Heuristic speed-up Generate candidates that might match Filter candidates with standard regex tool Generating candidates (Suffix Array) regex substring /C. * OH. */ OH Popularity Modification Suffix arrays designed for all matches (not k-best) Single sort order Two Alphabetic Order + Popularity Alternate on odd and even levels (like a kd-tree)

Standard Suffix Arrays

Standard Suffix Arrays

Sort Suffix arrays: Designed to find Frequency and Location Of pattern (substring) First “To

Sort Suffix arrays: Designed to find Frequency and Location Of pattern (substring) First “To Be” Last “To Be”

Single Sort Order Two Alphabetic and popularity Standard App Search Find all matches Modify

Single Sort Order Two Alphabetic and popularity Standard App Search Find all matches Modify Data Structure On alphabetic splits Do the standard thing On popularity splits, go left (pop) To find k-best Sort by 1 st order Stop if you have found k matches Otherwise, go right, if you have to Sort by 2 nd order Sort by 1 st

Modified Suffix Array Time Complexity O(log N) O(sqrt(N)) Worst case: Pattern with 0 matches

Modified Suffix Array Time Complexity O(log N) O(sqrt(N)) Worst case: Pattern with 0 matches Alphabetic splits are same as before Unfortunately, popularity splits don’t help Have to go both left and right everywhere (for 0 matches) Let P(N) be work to process N items on popularity splits A(N) be work to process N items on alphabetic splits In worst case A(N) = P(N/2) + C 2 P(N) = 2 A(N/2) + C 1 Therefore, P(N) = C 3 sqrt(N) + C 4

Conclusions Personalization and collaborative filtering Favorites (Personalization) Or other people search for a lot

Conclusions Personalization and collaborative filtering Favorites (Personalization) Or other people search for a lot You shouldn’t have to type a lot Wild Thing User enter wild cards anywhere Implicitly or Explicitly System finds k-best expansions Matching their Favorites and Hot Stuff To find stuff you search for a lot

Simple Uniform Look-And-Feel Simple, easy to use Even if you can’t spell, type… Even

Simple Uniform Look-And-Feel Simple, easy to use Even if you can’t spell, type… Even Bo’s 3 -year-old can do it Goal: All Forms Go Wild Uniform Look-and-Feel Currently, different systems are different Internet Browser Address Bar remembers where you’ve been Forms autopop name, credit card numbers, etc. Outlook Remembers you favorite e-mail addresses

Wild Thing Means different things to different people Encourage use of wild cards Implicit

Wild Thing Means different things to different people Encourage use of wild cards Implicit as well as explicit A Children’s Story With apologies to Hippos Go Berserk! Wild Thing Goes Mobile! Wild Thing Goes Local! All Forms Go Wild!!! For the young adult Wild Thing: You Make My Phone Sing!

© 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows Vista and other product names

© 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U. S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.