Morphological Analysis Chapter 3 Morphology Morpheme minimal meaningbearing
Morphological Analysis Chapter 3
Morphology • Morpheme = "minimal meaning-bearing unit in a language" • Morphology handles the formation of words by using morphemes – base form (stem, lemma), e. g. , believe – affixes (suffixes, prefixes, infixes), e. g. , un-, -able, -ly • Morphological parsing = the task of recognizing the morphemes inside a word – e. g. , hands, foxes, children • Important for many tasks – machine translation, information retrieval, etc. – Parsing, text simplification, etc 2
Morphemes and Words • Combine morphemes to create words w Inflection § combination of a word stem with a grammatical morpheme § same word class, e. g. clean (verb), clean-ing (verb) w Derivation § combination of a word stem with a grammatical morpheme § Yields different word class, e. g delight (verb), delight-ful (adj) w Compounding § combination of multiple word stems w Cliticization § combination of a word stem with a clitic § different words from different syntactic categories, e. g. I’ve = I + have 3
Inflectional Morphology • word stem + grammatical morpheme cat + s • only for nouns, verbs, and some adjectives • Nouns w plural: w regular: +s, +es irregular: mouse - mice; ox - oxen w many spelling rules: e. g. -y -> -ies like: butterfly - butterflies w possessive: +'s, +' • Verbs w main verbs (sleep, eat, walk) w modal verbs (can, will, should) w primary verbs (be, have, do) 4
Inflectional Morphology (verbs) • • Verb Inflections for: main verbs (sleep, eat, walk); primary verbs (be, have, do) • • • Morpholog. Form stem -s form -ing participle past; -ed participle Regularly Inflected Form walk merge walks merges walking merging walked merged try tries trying tried • • • Morph. Form stem -s form -ing participle -ed past -ed participle Irregularly Inflected Form eat catch eats catches eating catching ate caught eaten caught cuts cutting cut maps mapping mapped 5
Inflectional Morphology (nouns) • Noun Inflections for: • regular nouns (cat, hand); irregular nouns(child, ox) • Morpholog. Form • stem • plural form Regularly Inflected Form cat hand cats hands • Morph. Form • stem • plural form Irregularly Inflected Form child ox children oxen 6
Inflectional and Derivational Morphology (adjectives) • Adjective Inflections and Derivations: • • prefix suffix un-ly -ier, -iest -ness unhappy adjective, negation happily adverb, manner happier, happiest comparatives happiness noun • plus combinations, like unhappiest, unhappiness. • Distinguish different adjective classes, which can or cannot take certain inflectional or derivational forms, e. g. no negation for big. 7
Derivational Morphology (nouns) 8
Derivational Morphology (adjectives) 9
Verb Clitics 10
Morpholgy and FSAs • We’d like to use the machinery provided by FSAs to capture these facts about morphology w Recognition: § Accept strings that are in the language § Reject strings that are not w In a way that doesn’t require us to in effect list all the words in the language 11
Computational Lexicons • Depending on the purpose, computational lexicons have various types of information w Between Frame. Net and Word. Net, we saw POS, word sense, subcategorization, semantic roles, and lexical semantic relations w For our purposes now, we care about stems, irregular forms, and information about affixes 12
Starting Simply • Let’s start simply: w Regular singular nouns listed explicitly in lexicon w Regular plural nouns have an -s on the end w Irregulars listed explicitly too 13
Simple Rules 14
Now Plug in the Words Recognition of valid words But “foxs” isn’t right; we’ll see how to fix that 15
Parsing/Generation vs. Recognition • We can now run strings through these machines to recognize strings in the language • But recognition is usually not quite what we need w Often if we find some string in the language we might like to assign a structure to it (parsing) w Or we might have some structure and we want to produce a surface form for it (production/generation) • Example w From “cats” to “cat +N +PL” 16
Finite State Transducers • Add another tape • Add extra symbols to the transitions • On one tape we read “cats”, on the other we write “cat +N +PL” 17
FSTs 18
Applications • The kind of parsing we’re talking about is normally called morphological analysis • It can either be • An important stand-alone component of many applications (spelling correction, information retrieval) • Or simply a link in a chain of further linguistic analysis 19
Transitions c: c a: a t: t +N: ε +PL: s • c: c means read a c on one tape and write a c on the other • +N: ε means read a +N symbol on one tape and write nothing on the other • +PL: s means read +PL and write an s 20
Ambiguity • Recall that in non-deterministic recognition multiple paths through a machine may lead to an accept state. • Didn’t matter which path was actually traversed • In FSTs the path to an accept state does matter since different paths represent different parses and different outputs will result 21
Ambiguity • What’s the right parse (segmentation) for • Unionizable • Union-ize-able • Un-ion-ize-able • Each represents a valid path through the morphology machine. 22
Ambiguity • There a number of ways to deal with this problem • Simply take the first output found • Find all the possible outputs (all paths) and return them all (without choosing) • Bias the search so that only one or a few likely paths are explored 23
The Gory Details • Of course, its not as easy as • “cat +N +PL” <-> “cats” • As we saw earlier there are geese, mice and oxen • But there also a whole host of spelling/pronunciation changes that go along with inflectional changes • Fox and Foxes vs. Cat and Cats 24
Multi-Tape Machines • To deal with these complications, we will add more tapes and use the output of one tape machine as the input to the next • So to handle irregular spelling changes we’ll add intermediate tapes with intermediate symbols 25
Multi-Level Tape Machines # • We use one machine to transduce between the lexical and the intermediate level, and another to handle the spelling changes to the surface tape 26
Intermediate to Surface • The add an “e” rule as in fox^s# --> foxes# 27
Lexical to Intermediate Level 28
Foxes This arrow should point straight down # 29
Notes • The transducers may be run in the other direction too (examples in lecture) • The transducers are cascaded: The output of one layer serves as the input to the next 30
Overall Scheme We aren’t covering the overall scheme in any more detail than this # 31
- Slides: 31