Yupik language dictionary and processing software Eric Somerville

  • Slides: 1
Download presentation
Yup’ik language dictionary and processing software Eric Somerville, Researcher Dr. Frank Moore, Mentor Office

Yup’ik language dictionary and processing software Eric Somerville, Researcher Dr. Frank Moore, Mentor Office of Undergraduate Research and Scholarship, University of Alaska Anchorage Introduction Modeling Yup’ik Grammar Yup’ik Spell Checking What is a trie? This dictionary project is the first step in a larger project to develop tools that indigenous Alaskans can use to help revitalize their languages. The Yup’ik language is a polysynthetic language. This means that most Yup’ik words are formed using a base, zero or more postbases, and an appropriate ending. Each time a postbase is added to the base, it is treated as an expanded base and can receive additional postbases to add meaning to the word. The current algorithm I plan to use for this spell checking software will search through a series of trie data structures to search for matching morphemes, combining them using proper Yup’ik grammar. A trie is a type of tree data structure. This process begins by checking a list of bases, returning a list of possible bases to be checked. Each base will go through a process of adding appropriate postbases and endings, until the input word is either completely formed or found to be outside the dictionary. The lengthy explanation… Purpose To develop software to encourage Yup’ik writing using modern technologies. Goals • This project will develop data structures to define and store a digital dictionary for the Central Yup’ik language. • This project will develop basic word-checking software to show the functionality of this dictionary. Process • Define Yup’ik word-forming grammar rules with clear algorithms • Create data structures to store Yup’ik morphemes • Develop word-checking software to test data structures and algorithms The bases can be defined as being either a noun or a verb, and can be classified into one of six morpho-phonological classes that help define how postbases and endings will be added. This is a brief table of nouns professor Marie Mead uses in her Yup’ik language classes here at the university. Included are two endings to show notation used for postbases and endings. English The Central Yup’ik language, like Alaska Native languages all over the state, has been in decline for a generation or more. The younger generations of speakers are not learning and using the language of their ancestors. We may be able to reconnect with the next generation of speakers using modern technologies. Base 1 s-s “My one” Class -ka Mother Aana- I Aanat Aanaka Husband Ui Ui- II Uit Uika Fish/Food Neqa Neqe- III Neqet Neqeka Dog Revitalizing Central Yup’ik Language Citation Plural %: (e)t Son Daughter Qimugta Qetunraq Panik Qimugte. Qetunrar. Panig- IV V VI Qimugtet Qetunrat Paniit Below is a figure outlining this process. Base Trie Input String Return false Base Guess Return true ending found A tree structure begins with a head node, providing the starting point for the tree. This head node contains the addresses of each of the nodes that branch from it, each of which is referred to as a child node. Each of these child nodes, in turn, have links to child nodes of their own, until all the data that needs to be found in the tree has been stored. The trie data structure is a tree that can be easily used to store and look up words in a dictionary. The head node points at each beginning letter of a word. Each of these letters will point to possible letters that may follow it. This process is repeated until the longest word in the dictionary has been defined. However, a figure would be the best way to describe how this data structure works. Here is a trie storing the English words: DO, DOG, DOT, and COT. Ending Trie Postbase Trie Qimugteka Qetunraqa no base found A data structure is made up of a series of linked nodes. no postbase found Return false The short explanation… Expanded Base = Base Guess + Postbase Guess Head Node Panika Notice that in these examples the vowel, i, was doubled when adding class VI noun base, panig-, with the unpossessed plural ending, %: (e)t. Notice also how the k from first-person singular possessive (1 s-s) ending, -ka, reverted to -qa after forming with the r base ending of qetunrar-. Notice the process of adding postbases onto bases, forming extended bases, then passing the extended base back to the relevant postbase and ending tries. D C Five tries to get it right… O O The tries I’m currently planning to divide Yup’ik morphemes into are as follows: Defining the symbols The process of adding %: (e)t to class VI base, panig- is as follows: • % - the final consonant of the class VI base is retained. The base remains panig • (e) – the letter, e, is inserted for some bases, but not others. In this case, for class VI only. This gives us: panig: et • : - when a velar marked with this character is surrounded by single vowels, the vowel-velar-vowel series is replaced with a vowel-vowel pair. Giving us: paniit A base trie will store the list of both noun and verb bases in a single structure. A verbal-adding postbase trie will contain all postbases that can be added to a verb base. Some of these postbases will expand the verb and leave it verbal, some will expand the verb, changing it into a noun. end G T T end end Complimenting this trie will be the nominal-adding postbase trie, storing all postbases that can be added to noun bases. The final two tries will store verb and noun endings. References Acknowledgments For Further Information Jacobson, S. A. (1984). Yup’ik Eskimo Dictionary. Fairbanks, AK: Alaska Native Language Center. Krauss, M. E. (1980). Alaska native languages: Past, present, and future. Fairbanks, AK: Alaska Native Language Center. Opsahl, A. (ed. ). (2010). Alaska company wins $25. 3 broadband stimulus grant. Retrieved from http: //www. govtech. com/gt/742350 Reed, I. , Miyaoka, O. , Jacobson, S. , Afcan, P. , Krauss, M. (1977). Yup’ik Eskimo Grammar. Fairbanks, AK: Alaska Native Language Center. Thank you Theo Sery and Jeane Breinig with the UAA English department. Theo for helping me write about these projects and Jeane for being supportive with credit. Please contact esomervi@uaa. alaska. edu. Thank you Marie Meade and Nancy Furlow with Alaska Native Studies for providing excellent education and support for Alaska Native peoples. Thank you Frank Moore and Kendrick Mock in the Computer Science department for providing excellent instruction and guidance. But thank you most of all to Herb Schroeder and everybody in the ANSEP team. It’s their community environment and financial support that enables me to continue school. www. camai-ellamyui. com should become available over the summer.