Phonetic Algorithms Aaron Schneidereit Computer Science and Statistics

  • Slides: 1
Download presentation
Phonetic Algorithms Aaron Schneidereit Computer Science and Statistics Phonetic Algorithms Expect Error-Free Context Phonetic

Phonetic Algorithms Aaron Schneidereit Computer Science and Statistics Phonetic Algorithms Expect Error-Free Context Phonetic Algorithms are algorithms that provide encodings that are representative of how they are pronounced. These algorithms rely on the spelling of words, but many of their application (such as speech-to-text and spell check) often contain flaws during execution. This project tests how accurate three most notable phonetic algorithms perform when they are provided with words that contain common errors, or sound like other words. Simulating Common Errors in Audio and Spelling • The three algorithms that are tested in this project are: Soundex, NYSIIS, and Metaphone. These algorithms are a few of the most popular phonetic algorithms since many other phonetic algorithms were created from their previous versions. • These algorithms will be tested under three categories: vowel swapping, homophones, and character swapping. • Vowel Swapping will simulate vowels that can be easily mistaken for other vowels during pronunciation. For example, the words “ton” and “tun” are pronounced the same, but one is misspelled. • Homophones will test the accuracy of the phonetic algorithm since the encodings should be the same since the words are pronounced the same. • Character swapping will simulate simple misspellings of words. Often, when a person is typing, they will commonly input two characters in in the wrong order. For example, the word “exactly” might be incorrectly inputted as “excatly”. Acknowledgements Noah M. Daniels, URI Honor’s Program, Attendee’s of Professor Daniel’s Research Meetings Data Sets Cited Les Foster. “dictionary. txt” San Jose State University Machine Intelligence Laboratory “homophones 1. 01” University of Cambridge Metaphone and Soundex Display Best Results For Different Types of Errors • From these results, it is evident that there is no universal best algorithm out of the three that were tests. However, both NYSIIS and Metaphone performed better than Soundex in terms of having the best results during specific tests. • Metaphone performed the best in the character swapping and homophones showing that it has the best results for phonetic matching of two words that sound exactly alike. • Both Soundex and NYSIIS scored 100% on the vowel swaps. This shows that they have a higher tolerance for a different phonetic matching of similar data • Metaphone takes the longest to run as the algorithms requires steps that might increase or reduce the encoding depending on the ordering of characters. • Soundex is the fastest because it directly takes the first letter of every word and will always have a numerical encoding of 3 numbers.