Combining KR and search Crossword puzzles Next Logic
Combining KR and search: Crossword puzzles Next: Logic representations Reading: C. 7. 4 -7. 8
Changes in Homework § Mar 4 th: Hand in written design, planned code for all modules § Mar 9 th: midterm § Mar 25 th: Fully running system due § Mar 30 th: Tournament begins 2
Changes in Homework § Dictionary § Use dictionary provided; do not use your own § Start with 300 words only § Switch to larger set by time of tournament § Representation of dictionary is important to reducing search time § Using knowledge to generate word candidates could also help 3
Midterm Survey § Start after 9 AM Friday and finish by Thursday, Mar. 4 th § Your answers are important: they will affect remaining class structure 4
Crossword Puzzle Solver § Proverb: Michael Litman, Duke Univ § Developed by his AI class § Combines knowledge from multiple sources to solve clues (clue/target) § Uses constraint propogation in combination with probabilities to select best target 5
Algorithm Overview § Independent programs specialize in different types of clues – knowledge experts § Information retrieval, database search, machine learning § Each expert module generates a candidate list (with probabilities) § Centralized solver § Merges the candidates lists for each clue § Places candidates on the puzzle grid 6
Performance § Averages 95. 3% words correct and 98. 1% letters correct § Under 15 minutes/puzzle § Tested on a sample of 370 NYT puzzles § Misses roughly 3 words or 4 letters on a daily 15 X 15 puzzle 7
Questions § Is this approach any more intelligent than the chess playing programs? § Does the use of knowledge correspond to intelligence? § Do any of the techniques for generating words apply to Scrabble? 8
9
To begin: research style § Study of existing puzzles § How hard? § What are the clues like? § What sources of knowledge might be helpful? § Crossword Puzzle database (CWDB) § 350, 000 clue-target pairs § >250, 000 unique pairs § = # of puzzles seen over 14 years at rate of one puzzle/day 10
How novel are crossword puzzles? § Given complete database and a new puzzle, expect to have seen § 91% of targets § 50% of clues § 34% of clue target pairs § 96% of individual words in clues 11
12
Categories of clues § Fill in the blank: § 28 D: Nothing ____: less § Trailing question mark § 4 D: The end of Plato? : § Abbreviations § 55 D: Key abbr: maj 13
Expert Categories § Synonyms § 40 D Meadowsweet: spiraea § Kind-of § 27 D Kind of coal or coat: pea § “pea coal” and “pea coat” standard phrases § Movies § 50 D Princess in Woolf’s “Orlando”: sasha § Geography § 59 A North Sea port: aberdeen § Music § 2 D “Hold Me” country Grammay winner, 1988: oslin § Literature § 53 A Playwright/novelist Capek: karel § Information retrieval § 6 D Mountain known locally as Chomolungma: everest 14
15
16
17
Candidate generator § Farrow of “Peyton Place”: mia § Movie module returns: § 0. 909091 mia § 0. 010101 tom § 0. 010101 kip § 0. 010101 ben § 0. 010101 peg § 0. 010101 ray 18
19
20
Ablation tests § Removed each module one at a time, rerunning all training puzzles § No single module changed overall percent correct by more than 1% § Removing all modules that relied on CWDB § 94. 8% to 27. 1% correct § Using only the modules that relied exclusively on CWDB § 87. 6% correct 21
Word list modules § Word. List, Word. List. Big § Ignore their clues and return all words of correct length § Word. List u 655, 000 terms § Word. List. Big u u u Word. List plus constructed terms First and last names, adjacent words from clues 2. 1 million terms, all weighted equally § 5 D 10, 000 words, perhaps: novelette § Wordlist-CWDB § 58, 000 unique targets § Returns all targets of appropriate length § Weights with estimates of their “prior” probabilities as targets of arbitrary clues u Examine frequency in crossword puzzles and normalize to account for bias caused by letters intersecting across and down terms 22
CWDB-specific modules § Exact Match § Returns all targets of the correct length associated with the clue § Example error: it returns eeyore for 19 A Pal of Pooh: tigger § Transformations § Learns transformations to clue-target pairs § Single-word substitution, remove one phrase from beginning or end and § § § add another, depluralizing a word in clue, pluralize word in target Nice X <-> X in France X for short <-> X abbr. X start <-> Prefix with X X city <-> X capital 51 D: Bugs chaser: elmer, solved by Bugs pursuer: elmer and the transformation rule X pursuer <-> X chaser § http: //www. oneacross. com 23
Information retrieval modules § Encyclopedia § For each query term, compute distribution of terms “close” to query u u Counted 10 -k times every times it apears at a distance of k<10 from query term Extremely common terms (as, and) are ignored § Partial match u u For a clue c, find all clues in CWDB that share words For each such clue, give its target a weight § LSI-Ency, LSI-CWDB u u Latent semantic indexing (LSI) identifies correlations between words: synonyms Return closest word for each word in the clue 24
Database Modules § Movie § www. imdb. com § Looks for patterns in the clue and formulates query to database u u Quoted titles: 56 D “The Thief of Baghdad” role: abu Boolean operations: Cary or Lee: grant § Music, literary, geography § Simple pattern matching of clue (keywords “city”, “author”, “band”, etc) to formulate query § 15 A “Foundation of Trilogy” author: asimov § Geography database: Getty Information Institute 25
Synonyms § Word. Net § Look for root forms of words in the clue § Then find variety of related words u 49 D Chop-chop: apace § Synonyms of synonyms § Forms of related words converted to forms of clue word (number, tense) 18 A Stymied: thwarted u Is this relevant to Scrabble? u 26
Syntactic Modules § Fill-in-the-blanks § >5% clues § Search databases (music, geography, literary and quotes) to find clue patterns § 36 A Yerby’s “A Rose for _ _ _ Maria”: ana u u § Kindof Pattern: for _ _ _ Maria Allow any 3 characters to fill the blanks § Pattern matching over short phrases § 50 clues of this type u u u “type of” (A type of jacket: nehru) “starter for” (Starter for saxon: anglo) “suffix with” (Suffix with switch or sock: eroo 27
Implicit Distribution Modules § Some targets not included in any database, but more probable than random § Schaeffer vs. srhffeeca § Bigram module u Generates all possible letter sequences of the given length by returning a letter bigram distribution over all possible strings, learned from CWDB § Lowest probability clue-target, but higher probability than random sequence of letters u Honolulu wear: hawaiianmuumuu § How could this be used for Scrabble? 28
Questions § Is this approach any more intelligent than the chess playing programs? § Does the use of knowledge correspond to intelligence? § Do any of the techniques for generating words apply to Scrabble? 29
- Slides: 29