Morphology For Marathi POSTagger Veena Dixit 11 10

  • Slides: 19
Download presentation
Morphology For Marathi POS-Tagger Veena Dixit 11/ 10 /2005

Morphology For Marathi POS-Tagger Veena Dixit 11/ 10 /2005

Contents • Word • Morphology • Marathi Morphology - definition of the task and

Contents • Word • Morphology • Marathi Morphology - definition of the task and difficulties thereto. • Marathi Morphology - solutions to the challenges • Different word classes • Postpositions • Particles • Interjections • Conjunctions • Pronouns • Adjectives • Adverbs • Verbs • Nouns

 • Words are the orthographical strings separated by spaces and some punctuation marks.

• Words are the orthographical strings separated by spaces and some punctuation marks. • To syntax, words make sentences and to morphology, word has internal structure and has different inflectional forms. • Inflectional forms of a root word form a paradigm based on a principle. • Root word is the form which is stored in lexicons / dictionaries.

What is Morphology? • Morphology is the study of forms of words in the

What is Morphology? • Morphology is the study of forms of words in the language, especially the different forms used in declensions, conjugations, and word building. It deals with the morphemes. • Morpheme is a term which refers to the smallest component of a word that (a) seems to contribute some sort of meaning, or a grammatical function to the word to which it belongs, and (b) cannot be decomposed into smaller morphemes.

Marathi Morphology Definition of the task and difficulties thereto • Morphological analysis of Marathi

Marathi Morphology Definition of the task and difficulties thereto • Morphological analysis of Marathi plays significant role in natural language processing because Marathi, a pan Indian Language, is rich in morphology. • Marathi, being the language of the area situated centrally, gets influenced by almost all language groups of India. • This makes the Marathi morphology more complicated.

Marathi Morphology solutions to the challenges • Morphological analysis is done category wise. •

Marathi Morphology solutions to the challenges • Morphological analysis is done category wise. • Parameters for changes in the root word are identified. • Rules are constructed in the tabular form to facilitate computation.

Marathi Word Classes • • • Nouns Pronouns Adjectives Verbs Adverbs Postpositions Conjunctions Interjections

Marathi Word Classes • • • Nouns Pronouns Adjectives Verbs Adverbs Postpositions Conjunctions Interjections Particles Punctuation Mark

Postpositions • Postposition is the morpheme that follows the words and shows the relation

Postpositions • Postposition is the morpheme that follows the words and shows the relation between the word that is followed and other word in the sentences. • Case markers and shabdayogi avyaya are classified as postpositions in Marathi because they show same behavior. (ref. ‘Classification of Words’, Veena Dixit, proceedings of 26 th AICL, Shilong, 2004 )

Postpositions (continued) • In Marathi, postpositions are attached to all classes of words except

Postpositions (continued) • In Marathi, postpositions are attached to all classes of words except interjection. examples • When a postposition is attached to a stem it produces mainly adverb, but also, adjective and conjunction. • Postpositions are handled along with other word classes. • 5 subgroups of postpositions are identified on the basis of what is the possible order of their attachment and to which group of words they can be attached.

Particles • Strings like ह – hi_also , च – cha_only , सदध –

Particles • Strings like ह – hi_also , च – cha_only , सदध – suddha_also , are – sometimes attached to other words (e. g. . ख ल – khaali _under – ख ल सदध khaalisuddhaa_under also / झ ड - jhaa. Da _ tree झ डसदध - jhaa. Dasuddhaa _ tree also( – or sometimes they are written separately (e. g. . झ ड ख ल - jhaa. Daakhaali_ under the tree – झ ड ख ल सदध - jhaa. Daakhaalisuddhaa_ under the tree also). • When such words are attached to other words, the word to which it is attached, does not get inflected.

Interjections • Interjections are identified from the lexicon and stored to produce the tag.

Interjections • Interjections are identified from the lexicon and stored to produce the tag. Conjunctions • Conjunctions are identified from the lexicon and stored to produce the tag. • Morphology also plays a role in the case of conjunctions.

Conjunctions (continued) • When some of Marathi postpositions are attached to a pair of

Conjunctions (continued) • When some of Marathi postpositions are attached to a pair of demonstrative pronouns, they produce a pair of conjunctions in some instances. ज – जय प सन (jo – jyaapaasuna --- which – from त (to – tyaapaasuna --- that – from which) – तय प सन that) जय प सन नकक च क ल सरव त कल , तय प सन आज सरव त कर यल नक . – jyaapaasuna kaala suruvaata keli, tyaapaasun aaja nakkicha suruvaata karaayalaa nako_One should not start from the (same point) from which it was started yesterday.

Pronouns • Number of inflected forms of a pronoun and the rules describing such

Pronouns • Number of inflected forms of a pronoun and the rules describing such inflection are almost equal in number. • Number of pronouns and their respective inflected forms are finite and less when compared to verbs and nouns. • All inflected forms of the pronouns will be stored to produce the tag for pronoun. • Derivational morphology of pronoun is handled with rules.

Pronouns (continued) Inflectional forms of pronouns act either as adjectives (म झ – maajhaa_my)

Pronouns (continued) Inflectional forms of pronouns act either as adjectives (म झ – maajhaa_my) or as adverbs (मल – malaa_to me ) or as conjunctions (ज – जय प सन (jo – jyaapaasuna --- which – from which) त – तय प सन (to – tyaapaasuna --that – from that)).

Pronouns (continued) • All together 29 pronouns have 526 inflectional forms, which are either

Pronouns (continued) • All together 29 pronouns have 526 inflectional forms, which are either words or stems. • 21 paradigms are identified generating several rules.

Adjectives • Adjectives are mainly, inflectional and non inflectional. • Adjectives inflect for gender,

Adjectives • Adjectives are mainly, inflectional and non inflectional. • Adjectives inflect for gender, number and attachment of postposition to the noun they modify. • Adjectives in Marathi agree in gender and number with the nouns they modify.

Adjectives (continued) • All inflectional adjectives belong to one paradigm, which corresponds to several

Adjectives (continued) • All inflectional adjectives belong to one paradigm, which corresponds to several rules for generating inflectional and derivational forms from an adjective. • Most of ‘aa’ ending adjectives agree with masculine nouns and further get inflected according to the gender and number of the noun they modify. (म कळय / म कळ / _moka. Laa / moka. Li / moka. Le / moka. Lyaa_empty) • There are some exceptions to this rule, such as, (ज द - jaada_extra, न न – naanaa_different, व य vaayaa_wasted).

Adverbs • Adverbs are mainly, inflectional and non inflectional. • Adverbs inflect for attachment

Adverbs • Adverbs are mainly, inflectional and non inflectional. • Adverbs inflect for attachment of postpositions. ख ल – (khaali_under –-- ख लप सन – khaalapaasuna _from the underneath)

Verbs and Nouns will be discussed in next sessions. Thank you. Veena Dixit 11/

Verbs and Nouns will be discussed in next sessions. Thank you. Veena Dixit 11/ 10 /2005