USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS
- Slides: 36
USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey
Problem n n n For a given definition, find the appropriate word (or words) Traditional dictionary is of no use From a dictionary, find an appropriate word that has a “similar” definition
Examples n User definition: Akımı ölçmek için kullanılan alet (A device that is used to measure the currenta) n ? In the dictionary: akımölçer: elektrik akımının şiddetini ölçmeye yarayan araç, ampermetre (ammeter: a device that measures the intensity of electrical current, amperemeter)
Applications n n n Computer-assisted language learning Solving crossword puzzles Reverse dictionary
Outline n n n n Problem statement Meaning-to-Word System (MTW) Our Approach Methods Result Summary Conclusion
Problem Statement n Find the “similarity” between two definitions Akımı ölçmek için kullanılan alet (A device that is used to measure the current) Elektrik akımının şiddetini ölçmeye yarayan araç, ampermetre (a device that measures the intensity of electrical current, amperemeter)
Meaning-to-Word (MTW) n n addresses the problem of finding the appropriate word (or words), whose meaning “matches” the given definition Two subproblems n n finding words whose definitions are "similar" to the query in some sense ranking the candidate words using a variety of ways
Information Flow in MTW User Definition query Search in Dictionary candidates Rank Candidates List of words
Available Resources n Turkish Monolingual Dictionary n n About 50. 000 entries Turkish Word. Net n About 11. 000 synsets
Normalization User Definition Normalization query Search in Dictionary candidates Rank Candidates List of words
Normalization n Tokenization Stemming Stop Word Elimination
Query Processing User Definition query Query Processing Search in Dictionary candidates Rank Candidates List of words
Query Processing n Subset Generation n n Search with different set of words Select informative words from user’s query Query: daha önce hiç evlenmemiş kişi (a person who has never been married) {önce, evlen, kişi} (before, marry, person) {evlen, kişi}, {önce, evlen} (marry, person) (before, marry) {evlen}, {önce}, {kişi} (marry) (before) (person)
Query Processing n Subset Sorting n n Unordered list of subsets are insufficient Rank the generated subsets 1) By the number of words {önce, evlen, kişi} (before, marry, person) {evlen, kişi} (marry, person) 2) By the sum of frequency logarithm {evlen, kişi} (marry, person) {önce, kişi} (before, person)
Searching for Meanings User Definition query Search in Dictionary candidates Rank Candidates List of words
Searching for Meanings n Two methods n n Stem Matching Query Expansion (using Word. Net)
Stem Matching n Morphological normalization of words n Find meanings that contain morphological variants of the original definition
Stem Matching (Ex. ) (A device that is used to measure the current) { akımı ak (white) ölçmek için kullanılan alet } ölç(measure) için(to) kullan(use) alet (device) akım(current) akı (flux) iç(drink) kul (slave) Colored stems are the matching ones
Stem Matching (A device that is used to measure the current) akımı ölçmek için kullanılan alet elektrik akımının şiddetini ölçmeye yarayan araç, ampermetre (a device that measures the intensity of electrical current, amperemeter)
Stem Matching (A device that is used to measure the current) akımı ölçmek için kullanılan alet elektrik akımının şiddetini ölçmeye yarayan araç, ampermetre (a device that measures the intensity of electrical current, amperemeter)
Stem Matching n Drawbacks n Generate noisy stems ilim (science, my city) ilim (science), il (city) n Conflate two words with very different meanings to the same stem ilim (science, my city), ilde (in the city) il (city) n Cannot find relations between similar words kimse (someone) bölüm (part) kişi (person) kısım (portion)
Using Query Expansion n Two different approaches: n n n Expand query with relations (synonyms, specializations, generalizations) Expand query with unexpanded query’s relevant answers Word. Net synonyms are used in MTW improve) {besin, gıda} (food, nourishment) {iyileş, düzel} (to get better) /{iyileş, geliş} (to
Query Expansion (Ex. ) (A device that is used to measure the current) { akımı ak (white) ölçmek kullanılan alet } ölç(measure) için(to) kullan(use) alet (device) akım(current) akı (flux) beyaz debi akış için iç(drink) kul (slave) faydalan yararlan köle araç gereç
Query Expansion (Ex. ) (A device that is used to measure the current) akımı ölçmek için kullanılan alet elektrik akımının şiddetini ölçmeye yarayan araç, ampermetre (a device that measures the intensity of electrical current, amperemeter)
Query Expansion (Ex. ) (A device that is used to measure the current) akımı ölçmek için kullanılan alet elektrik akımının şiddetini ölçmeye yarayan araç, ampermetre (a device that measures the intensity of electrical current, amperemeter)
Ranking User Definition query Search in Dictionary candidates Rank Candidates List of words
Ranking n Very important part of MTW n n Having the right answer in the retrieved set is not enough Aim is to have the right answer at top of the retrieved set (Ex: in first top 50 answers)
Ranking n Simple but effective methods n n Number of matched words Subset informativeness - frequency of words in the subset Ratio of number of matched words to the number of words in the candidate dictionary definition Longest Common Subsequence - order of the matched words
Some Statistics n Training sets: n 50 queries from users 50 queries from a dictionary Test sets: n 50 queries from users n 50 queries from a separate dictionary n n Test set 1 (user) Training set 1 Test set 2 (dict. ) Training set 2 # of queries 50 50 Avg. # of query words 5. 66 4. 64 9. 24 13. 98 Max. # of query words 17 12 23 45 Min. # of query words 2 1 1 6
Stem Matching all stems included Rank Test set 1 Training set 1 Test set 2 Training set 2 1 -10 13 (26%) 18 (36%) 45 (90%) 41 (82%) 11 -50 7 (14%) 12 (24%) 2 (4%) 5 (10%) >50 19 (38%) 10 (20%) 3 (6%) 4 (8%) Not found 11 (22%) 10 (20%) 0 (0%) Low % in top 10 in user queries but very high results in dictionary queries
Stem Matching longest stem included (heuristics) Rank Test set 1 Training set 1 Test set 2 Training set 2 1 -10 14 (28%) 21 (42%) 46 (92%) 43 (86%) 11 -50 5 (10%) 9 (18%) 1 (2%) 5 (10%) >50 18 (36%) 9 (18%) 3 (6%) 2 (4%) Not found 13 (26%) 11 (22%) 0 (0%) Improvement in user queries, slightly better performance in dictionary queries
Query Expansion (Word. Net) all stems included Rank Test set 1 Training set 1 Test set 2 Training set 2 1 -10 14(28%) 24 (48%) 45 (90%) 41 (82%) 11 -50 9 (18%) 2 (4%) 5 (10%) >50 18 (36%) 12 (24%) 3 (6%) 4 (8%) Not found 9 (18%) 5 (10%) 0 (0%) Better results in user queries, no change in dictionary queries
Query Expansion (Word. Net) longest stem included (heuristics) Rank Test set 1 Training set 1 Test set 2 Training set 2 1 -10 14 (28%) 24 (48%) 41 (82%) 39 (78%) 11 -50 6 (12%) 8 (16%) 5 (10%) 6 (12%) >50 21 (42%) 13 (26%) 1 (2%) 5 (10%) Not found 9 (18%) 5 (10%) 0 (0%) Better performance than ‘longest stem matching’ in user queries, but worse performance in dictionary queries
Result Summary n Stem Matching (longest stem included) n n n 60% success in real user queries 96% success in dictionary queries Query Expansion (all stems included) n n 68% success in real user queries 92% success in dictionary queries
Conclusion n We have implemented a ‘Meaning to Word’ system for Turkish Results on unseen data are rather satisfactory Query expansion is better n n Although, it cannot find the words for all queries 68% of real user queries and 90% of dictionary queries are found in the first 50 results
THANK YOU !
- Whats alliteration in a poem
- Match the word with the correct definition
- Arabic wordnet python
- What is wordnet
- Sanskrit wordnet
- Wordnet demo
- Christmas symbols and their meanings
- 7 beatitudes in revelation
- Numerals
- Knights shields designs
- The cardinal and theological virtues
- Put phrasal verbs
- Match each vocabulary word to the correct definition
- What are the 7 beatitudes of revelation
- What is symbolisim
- Igbo masks and their meanings
- Geometrical dimensions and tolerances
- Possessive adjective clause.
- Find the following words:
- Quran root words
- Figurative and literal examples
- Pengertian dan prinsip teknologi ramah lingkungan
- Five year old tammy mistakenly believes that her short
- Infer viper
- Vipers questions
- Structuralism vs functionalism psychology
- Enterprise vault icon
- Retrieve
- Romeo and juliet translated
- How do i classify polygons using their attributes
- How do i classify polygons using their attributes
- Using system.collections.generic
- Dtfd switch
- Weather station symbols
- Scalar and vector venn diagram
- Orange blue green gold personality test printable
- Tlingit totem pole meanings