USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS

  • Slides: 36
Download presentation
USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer

USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey

Problem n n n For a given definition, find the appropriate word (or words)

Problem n n n For a given definition, find the appropriate word (or words) Traditional dictionary is of no use From a dictionary, find an appropriate word that has a “similar” definition

Examples n User definition: Akımı ölçmek için kullanılan alet (A device that is used

Examples n User definition: Akımı ölçmek için kullanılan alet (A device that is used to measure the currenta) n ? In the dictionary: akımölçer: elektrik akımının şiddetini ölçmeye yarayan araç, ampermetre (ammeter: a device that measures the intensity of electrical current, amperemeter)

Applications n n n Computer-assisted language learning Solving crossword puzzles Reverse dictionary

Applications n n n Computer-assisted language learning Solving crossword puzzles Reverse dictionary

Outline n n n n Problem statement Meaning-to-Word System (MTW) Our Approach Methods Result

Outline n n n n Problem statement Meaning-to-Word System (MTW) Our Approach Methods Result Summary Conclusion

Problem Statement n Find the “similarity” between two definitions Akımı ölçmek için kullanılan alet

Problem Statement n Find the “similarity” between two definitions Akımı ölçmek için kullanılan alet (A device that is used to measure the current) Elektrik akımının şiddetini ölçmeye yarayan araç, ampermetre (a device that measures the intensity of electrical current, amperemeter)

Meaning-to-Word (MTW) n n addresses the problem of finding the appropriate word (or words),

Meaning-to-Word (MTW) n n addresses the problem of finding the appropriate word (or words), whose meaning “matches” the given definition Two subproblems n n finding words whose definitions are "similar" to the query in some sense ranking the candidate words using a variety of ways

Information Flow in MTW User Definition query Search in Dictionary candidates Rank Candidates List

Information Flow in MTW User Definition query Search in Dictionary candidates Rank Candidates List of words

Available Resources n Turkish Monolingual Dictionary n n About 50. 000 entries Turkish Word.

Available Resources n Turkish Monolingual Dictionary n n About 50. 000 entries Turkish Word. Net n About 11. 000 synsets

Normalization User Definition Normalization query Search in Dictionary candidates Rank Candidates List of words

Normalization User Definition Normalization query Search in Dictionary candidates Rank Candidates List of words

Normalization n Tokenization Stemming Stop Word Elimination

Normalization n Tokenization Stemming Stop Word Elimination

Query Processing User Definition query Query Processing Search in Dictionary candidates Rank Candidates List

Query Processing User Definition query Query Processing Search in Dictionary candidates Rank Candidates List of words

Query Processing n Subset Generation n n Search with different set of words Select

Query Processing n Subset Generation n n Search with different set of words Select informative words from user’s query Query: daha önce hiç evlenmemiş kişi (a person who has never been married) {önce, evlen, kişi} (before, marry, person) {evlen, kişi}, {önce, evlen} (marry, person) (before, marry) {evlen}, {önce}, {kişi} (marry) (before) (person)

Query Processing n Subset Sorting n n Unordered list of subsets are insufficient Rank

Query Processing n Subset Sorting n n Unordered list of subsets are insufficient Rank the generated subsets 1) By the number of words {önce, evlen, kişi} (before, marry, person) {evlen, kişi} (marry, person) 2) By the sum of frequency logarithm {evlen, kişi} (marry, person) {önce, kişi} (before, person)

Searching for Meanings User Definition query Search in Dictionary candidates Rank Candidates List of

Searching for Meanings User Definition query Search in Dictionary candidates Rank Candidates List of words

Searching for Meanings n Two methods n n Stem Matching Query Expansion (using Word.

Searching for Meanings n Two methods n n Stem Matching Query Expansion (using Word. Net)

Stem Matching n Morphological normalization of words n Find meanings that contain morphological variants

Stem Matching n Morphological normalization of words n Find meanings that contain morphological variants of the original definition

Stem Matching (Ex. ) (A device that is used to measure the current) {

Stem Matching (Ex. ) (A device that is used to measure the current) { akımı ak (white) ölçmek için kullanılan alet } ölç(measure) için(to) kullan(use) alet (device) akım(current) akı (flux) iç(drink) kul (slave) Colored stems are the matching ones

Stem Matching (A device that is used to measure the current) akımı ölçmek için

Stem Matching (A device that is used to measure the current) akımı ölçmek için kullanılan alet elektrik akımının şiddetini ölçmeye yarayan araç, ampermetre (a device that measures the intensity of electrical current, amperemeter)

Stem Matching (A device that is used to measure the current) akımı ölçmek için

Stem Matching (A device that is used to measure the current) akımı ölçmek için kullanılan alet elektrik akımının şiddetini ölçmeye yarayan araç, ampermetre (a device that measures the intensity of electrical current, amperemeter)

Stem Matching n Drawbacks n Generate noisy stems ilim (science, my city) ilim (science),

Stem Matching n Drawbacks n Generate noisy stems ilim (science, my city) ilim (science), il (city) n Conflate two words with very different meanings to the same stem ilim (science, my city), ilde (in the city) il (city) n Cannot find relations between similar words kimse (someone) bölüm (part) kişi (person) kısım (portion)

Using Query Expansion n Two different approaches: n n n Expand query with relations

Using Query Expansion n Two different approaches: n n n Expand query with relations (synonyms, specializations, generalizations) Expand query with unexpanded query’s relevant answers Word. Net synonyms are used in MTW improve) {besin, gıda} (food, nourishment) {iyileş, düzel} (to get better) /{iyileş, geliş} (to

Query Expansion (Ex. ) (A device that is used to measure the current) {

Query Expansion (Ex. ) (A device that is used to measure the current) { akımı ak (white) ölçmek kullanılan alet } ölç(measure) için(to) kullan(use) alet (device) akım(current) akı (flux) beyaz debi akış için iç(drink) kul (slave) faydalan yararlan köle araç gereç

Query Expansion (Ex. ) (A device that is used to measure the current) akımı

Query Expansion (Ex. ) (A device that is used to measure the current) akımı ölçmek için kullanılan alet elektrik akımının şiddetini ölçmeye yarayan araç, ampermetre (a device that measures the intensity of electrical current, amperemeter)

Query Expansion (Ex. ) (A device that is used to measure the current) akımı

Query Expansion (Ex. ) (A device that is used to measure the current) akımı ölçmek için kullanılan alet elektrik akımının şiddetini ölçmeye yarayan araç, ampermetre (a device that measures the intensity of electrical current, amperemeter)

Ranking User Definition query Search in Dictionary candidates Rank Candidates List of words

Ranking User Definition query Search in Dictionary candidates Rank Candidates List of words

Ranking n Very important part of MTW n n Having the right answer in

Ranking n Very important part of MTW n n Having the right answer in the retrieved set is not enough Aim is to have the right answer at top of the retrieved set (Ex: in first top 50 answers)

Ranking n Simple but effective methods n n Number of matched words Subset informativeness

Ranking n Simple but effective methods n n Number of matched words Subset informativeness - frequency of words in the subset Ratio of number of matched words to the number of words in the candidate dictionary definition Longest Common Subsequence - order of the matched words

Some Statistics n Training sets: n 50 queries from users 50 queries from a

Some Statistics n Training sets: n 50 queries from users 50 queries from a dictionary Test sets: n 50 queries from users n 50 queries from a separate dictionary n n Test set 1 (user) Training set 1 Test set 2 (dict. ) Training set 2 # of queries 50 50 Avg. # of query words 5. 66 4. 64 9. 24 13. 98 Max. # of query words 17 12 23 45 Min. # of query words 2 1 1 6

Stem Matching all stems included Rank Test set 1 Training set 1 Test set

Stem Matching all stems included Rank Test set 1 Training set 1 Test set 2 Training set 2 1 -10 13 (26%) 18 (36%) 45 (90%) 41 (82%) 11 -50 7 (14%) 12 (24%) 2 (4%) 5 (10%) >50 19 (38%) 10 (20%) 3 (6%) 4 (8%) Not found 11 (22%) 10 (20%) 0 (0%) Low % in top 10 in user queries but very high results in dictionary queries

Stem Matching longest stem included (heuristics) Rank Test set 1 Training set 1 Test

Stem Matching longest stem included (heuristics) Rank Test set 1 Training set 1 Test set 2 Training set 2 1 -10 14 (28%) 21 (42%) 46 (92%) 43 (86%) 11 -50 5 (10%) 9 (18%) 1 (2%) 5 (10%) >50 18 (36%) 9 (18%) 3 (6%) 2 (4%) Not found 13 (26%) 11 (22%) 0 (0%) Improvement in user queries, slightly better performance in dictionary queries

Query Expansion (Word. Net) all stems included Rank Test set 1 Training set 1

Query Expansion (Word. Net) all stems included Rank Test set 1 Training set 1 Test set 2 Training set 2 1 -10 14(28%) 24 (48%) 45 (90%) 41 (82%) 11 -50 9 (18%) 2 (4%) 5 (10%) >50 18 (36%) 12 (24%) 3 (6%) 4 (8%) Not found 9 (18%) 5 (10%) 0 (0%) Better results in user queries, no change in dictionary queries

Query Expansion (Word. Net) longest stem included (heuristics) Rank Test set 1 Training set

Query Expansion (Word. Net) longest stem included (heuristics) Rank Test set 1 Training set 1 Test set 2 Training set 2 1 -10 14 (28%) 24 (48%) 41 (82%) 39 (78%) 11 -50 6 (12%) 8 (16%) 5 (10%) 6 (12%) >50 21 (42%) 13 (26%) 1 (2%) 5 (10%) Not found 9 (18%) 5 (10%) 0 (0%) Better performance than ‘longest stem matching’ in user queries, but worse performance in dictionary queries

Result Summary n Stem Matching (longest stem included) n n n 60% success in

Result Summary n Stem Matching (longest stem included) n n n 60% success in real user queries 96% success in dictionary queries Query Expansion (all stems included) n n 68% success in real user queries 92% success in dictionary queries

Conclusion n We have implemented a ‘Meaning to Word’ system for Turkish Results on

Conclusion n We have implemented a ‘Meaning to Word’ system for Turkish Results on unseen data are rather satisfactory Query expansion is better n n Although, it cannot find the words for all queries 68% of real user queries and 90% of dictionary queries are found in the first 50 results

THANK YOU !

THANK YOU !