Investigating the Ancient Meroitic Language Using Statistical Natural

  • Slides: 13
Download presentation
Investigating the Ancient Meroitic Language Using Statistical Natural Language Techniques: Zipf’s Law and Word

Investigating the Ancient Meroitic Language Using Statistical Natural Language Techniques: Zipf’s Law and Word Co-Occurrences Reginald Smith August 10, 2006 Sudan Studies Association Conference Rhode Island College

Meroitic is the language of the ancient kingdom of Kush • Used for almost

Meroitic is the language of the ancient kingdom of Kush • Used for almost six hundred years from 2 nd century BCE to 4 th century CE • Phonetic language written right to left (like Arabic) • Transliteration made possible by work of British archaeologist FL Griffith around 1910

Meroitic remains largely undeciphered an enigma • No complete vocabulary is available • Some

Meroitic remains largely undeciphered an enigma • No complete vocabulary is available • Some words such as place names, loan words, or simple concepts are known – For example or – Perhaps “qore” means king or “qes” is Kush • Many attempts have been made to understand Meroitic using phonology or comparative linguistics – Scholars have tried in vain to find a known language that is a relative (see sources in paper) – We wish we had a bilingual text like the Rosetta stone to guide us

A new method could use mathematics and linguistics • Statistical natural language processing analyzes

A new method could use mathematics and linguistics • Statistical natural language processing analyzes the properties of language using a mix of statistics and linguistics • There are several properties of languages that are the same in all human languages • Certain techniques can also help us possibly infer meanings of words (by relating them to other known words)

Zipf’s Law: Frequencies of Words • If you rank order words in a text

Zipf’s Law: Frequencies of Words • If you rank order words in a text by how frequent (# of times a word appears) they are (#1 being most frequent) and then relate this to the frequency of the word, you get Zipf’s Law • Zipf’s Law: where F is the frequency of a word, C is a constant, R is the rank, and α is known as the power law exponent • For all languages α ≈ 1

Zipf Law Graphs • When you graph the frequency vs. the rank on a

Zipf Law Graphs • When you graph the frequency vs. the rank on a log-log graph (graphing the logarithm of frequency vs. the logarithm of rank) you get a straight line whose slope is α Zipf line fit on data. The red line is the fitted slope on the data points Picture Source: University of Helsinki CS department

Does Meroitic follow Zipf’s Law? • The two graphs below show log-log plots of

Does Meroitic follow Zipf’s Law? • The two graphs below show log-log plots of frequency vs. rank for the Meroitic words in 69 texts. The slopes are shown for each – The normal plot counts the words as is. The morpheme out plot split out suffixes like –lowi as the separate words “lo” and “wi” – Since it has a slope of nearly -1 the morpheme out model of Meroitic seems to follow Zipf’s Law Normal plot Slope = -0. 81 Morpheme out plot Slope = -1. 03

So what does this show us (besides graphs) • Despite the apparently low amount

So what does this show us (besides graphs) • Despite the apparently low amount of texts available, our sample of Meroitic is structured just like all other human languages (English, Chinese, etc. ) • Therefore, even though we don’t know the meaning of the words, we know that the language we have is representative – Even though most of our samples are redundant funeral stelae • We can then proceed to use other statistical techniques on Meroitic and also compare its statistical features to other languages

Step Two: Word Co-occurrence • When words occur together in a text, they are

Step Two: Word Co-occurrence • When words occur together in a text, they are said to co-occur – “I am here” has co-occurrence between “I-am” and “am-here” • Co-occurrences can tell us about the words if we have enough of them – Words that co-occur with the same words often have similar parts of speech or even meanings – Can we use word co-occurrence in Meroitic to analyze classes of words?

What I did with Meroitic • I analyzed Meroitic by matching together words that

What I did with Meroitic • I analyzed Meroitic by matching together words that co-occurred with the same types of words • For example if you have two sentences: “I eat horses” and “We eat lizards” – I match “I” and “We” because they both co-occur with “eat” – I also match “horses” and “lizards” because they also co-occur with “eat” (in the opposite direction*) • I then graph connected words together and analyze them with software – What happens? *Technical note: I actually used undirected edges for co-occurring words in the graph shown on the next page

Meroitic Words Graph Group 3 Group 4 Group 1 Group 2 • Four main

Meroitic Words Graph Group 3 Group 4 Group 1 Group 2 • Four main groups of words form that correspond well to Meroitic categories including positions and titles, verbs, places, and miscellaneous nouns

Results • Techniques like the word co-occurrence matching can help us categorize Meroitic words

Results • Techniques like the word co-occurrence matching can help us categorize Meroitic words that we previously guessed on by mapping them against words we already know the part of speech for • Similar statistical techniques may allow us to match words with a similar “meaning” to infer the meanings of some words – This is still speculative though

Conclusion • Statistical natural language processing is a new approach to Meroitic that could

Conclusion • Statistical natural language processing is a new approach to Meroitic that could supplement other current efforts in the language • Much more work remains to be done, but this new avenue may help us move closer to the goal of understanding this beautiful and mysterious language • Acknowledgements: I give my boundless appreciation to Dr. Richard Lobban and Dr. Laurance Doyle for the help and advice they gave me on this paper’s topics