Morphology What is morphology Finite State Transducers Two

  • Slides: 50
Download presentation
Morphology What is morphology? Finite State Transducers Two Level Morphology

Morphology What is morphology? Finite State Transducers Two Level Morphology

What is morphology? • Decomposition of words into meaningful units: anti dis establish ment

What is morphology? • Decomposition of words into meaningful units: anti dis establish ment arian ism • Interacts with- syntax( categories and word order) [establish] = verb + ment = noun phonology: divine divinity obscene obscenity • Interacts with semantics: boys Peterchen

Phonological String morphological analyzer dictionary lookup syntactic analyzer lexical- semantic analysis discourse processing

Phonological String morphological analyzer dictionary lookup syntactic analyzer lexical- semantic analysis discourse processing

Why store all words as morphemes rather than all Morphological combinations as words? What

Why store all words as morphemes rather than all Morphological combinations as words? What does the morphological analyzer have to output?

The what and the how: • Efficient and effective algorithm to decompose categories into,

The what and the how: • Efficient and effective algorithm to decompose categories into, or build categories from, component morphemes. What this algorithm will be depends on problems it has to solve. In turn depends on representations computed. Given stem /lemma ( e. g. ‘jump’ add material to change category Or grammatical properties of word ‘jumped’, ‘jumpable’ • order of composition matters: ride/ riding enoble/ enobling/*nobling Adj ---> V, trance/*trancing/entrance/entrancing V===> V+ing

CONCATENATIVE MORPHOLOGICAL PROCESSES: COMPOUNDING: firefighter PREFIXATION: Un+ well INFIXATION: ( TAGALOG) fikas - strong

CONCATENATIVE MORPHOLOGICAL PROCESSES: COMPOUNDING: firefighter PREFIXATION: Un+ well INFIXATION: ( TAGALOG) fikas - strong fumikas - be strong SUFFIXATION: Kick + er CIRCUMFIXATION: ( German) ge [sag] t past prefix [say] past suffix

Inflectional Morphology • non category changing, required by syntax • Agreement: person/number: Je parle

Inflectional Morphology • non category changing, required by syntax • Agreement: person/number: Je parle Nous parlons Ils parlent • Gender: la petite ( the little one (fem)) le petit ( the little one (masc)) la squelette ( the skeleton)

Derivational Morphology • changes category. Not required by syntax Deverbal Nominal: bak+er catch+ er

Derivational Morphology • changes category. Not required by syntax Deverbal Nominal: bak+er catch+ er tion: destroy/destruction Roman's destruction of the city 'er' = agent of action Catcher of the ball John’s catcher of the ball 'John" ~= one who caught

Regular vs Irregular Jump/jumped hit/hit bring/brought sing/sang Productive/Non-Productive adore/adorable, kick/kickable, fax/faxable produce/production destroy/destruction *graft/graftuction

Regular vs Irregular Jump/jumped hit/hit bring/brought sing/sang Productive/Non-Productive adore/adorable, kick/kickable, fax/faxable produce/production destroy/destruction *graft/graftuction Bring/ brought

Regular (English) Verbs Morphological Form Classes Regularly Inflected Verbs Stem walk merge try map

Regular (English) Verbs Morphological Form Classes Regularly Inflected Verbs Stem walk merge try map -s form walks merges tries maps -ing form walking merging trying mapping Past form or –ed participle walked merged tried mapped

Irregular (English) Verbs Morphological Form Classes Irregularly Inflected Verbs Stem eat catch cut -s

Irregular (English) Verbs Morphological Form Classes Irregularly Inflected Verbs Stem eat catch cut -s form eats catches cuts -ing form eating catching cutting Past form ate caught cut -ed participle eaten caught cut

“To love” in Spanish

“To love” in Spanish

 • Productive and rule governed: fax +er ? ? ? Crudoy cruduction •

• Productive and rule governed: fax +er ? ? ? Crudoy cruduction • Category sensitivity: breakable/* manable sensitivity/ *hittivity • Semantic sensitivity: un + well un + happy *un + ill *un+ sad

Store morphemes or words? lebensversicherungsgesellschaftsangesteller leben+ versicherung + gesellschaft+s+angesteller life insurance company +Poss employee

Store morphemes or words? lebensversicherungsgesellschaftsangesteller leben+ versicherung + gesellschaft+s+angesteller life insurance company +Poss employee Turkish: Turkish verns have 40 k forms

Non- concatenative Morphology • Templatic morphology (Semitic languages): lmd (learn), lamad (he studied), limed

Non- concatenative Morphology • Templatic morphology (Semitic languages): lmd (learn), lamad (he studied), limed (he taught), lumad (he was taught)

Concatenation: Beads on a string Agglutinative ( concatenative) languages are well behaved for FSAs

Concatenation: Beads on a string Agglutinative ( concatenative) languages are well behaved for FSAs as long as we don’t include phonological or spelling changes Verb Lexicon: jump kiss stream hop, ? ? ? jump+ed kiss+ed stream+ed *hopp+ed q q 0 verb qq 11 ed q 2

Pieces of a Morphological Analyzer un q 0 adj-root q 1 q 2 -er,

Pieces of a Morphological Analyzer un q 0 adj-root q 1 q 2 -er, est, ly q 3 The lexicon stores the lemmas, and divides them into adjective classes really/clearly *bigly/redly Morphotactics: State sequence indicates order of morpheme composition e. g. comparative or adverb formation is by suffixation

Lexicon • Arranged as TRIE ( letter strings in common relative to position n-k-e-y

Lexicon • Arranged as TRIE ( letter strings in common relative to position n-k-e-y D-o -g • Classed by part of speech category ( noun, verb) and morphotactic (which other affixes can precede or follow) or orthographic considerations.

Orthography • spelling rules- handle phonological or spelling variation in orthographic a morpheme Try

Orthography • spelling rules- handle phonological or spelling variation in orthographic a morpheme Try /trying/tries Cringe/cringing/cringes

FSA for Inflectional Morphology: English Nouns

FSA for Inflectional Morphology: English Nouns

FSA for Inflectional Morphology: English Verbs

FSA for Inflectional Morphology: English Verbs

FSA for Derivational Morphology: Adjectival Formation

FSA for Derivational Morphology: Adjectival Formation

More Complex Derivational Morphology

More Complex Derivational Morphology

Using FSAs for Recognition: English Nouns and their Inflection

Using FSAs for Recognition: English Nouns and their Inflection

Orthographic • Want association between morpheme and semantic function • Want association between allographs

Orthographic • Want association between morpheme and semantic function • Want association between allographs or allophones of the same phoneme Allographs: city -cities bake- baking divine-divinity tried

Finite State Transducers (FSTs)- the Big Idea Need to relate lexical level, the level

Finite State Transducers (FSTs)- the Big Idea Need to relate lexical level, the level that gives us the morphological analysis (+plural, +able to the surface level that keeps track of phonological/ or graphological (spelling_ changes)

Parsing vs recognition • An FSA can give you the string composition of a

Parsing vs recognition • An FSA can give you the string composition of a morphological sequence, and can tell you whether a given morphological string is or is not in the language. It recognizes the string • An FST parses the string. It tells you the morphological structure associated with the string. Other instances of parsing?

Formal definition • An FST defines a relation between sets of pairs of strings:

Formal definition • An FST defines a relation between sets of pairs of strings: • It contains at least a lexical level that is a concatenation of morphemes and a surface level that shows the correct spelling for each morpheme in a given context cat/sheep ^ s e. g. noun (instanciated from lexicon) + plural E s cats/sheep

Q= finite set of states q 0 to qn finite alphabet of complex symbols

Q= finite set of states q 0 to qn finite alphabet of complex symbols (feasible pairs) i: o with one symbol from the input alphabet Q 0 = the start state F= set of final states = (q, i: o) the transition function or matrix between states. Takes a state from Q and a complex symbol i: o from and returns a new state. feasible pair: a relation of a symbol on one tape to a symbol on the other tape. e. g. can + [pl: ^s]

 • default pair- the upper tape is the same as the lower tape

• default pair- the upper tape is the same as the lower tape same input as output : c*a*t/c: c*a: a*t: t*pl: ^s • feasible pairs either stated in lexicon if irregular g: g*o: e*s: s*e: e goose: geese or by an automaton that stipulates correspondence in rule governed way if the relation is regular. If regular, indicated as Default paris and usually represented by one symbol. • FSTs are closed under: inversion: switches i/o labels composition: union of two transducers one after the other.

trie: in lexicon, categories arranged by letter one at a time with class at

trie: in lexicon, categories arranged by letter one at a time with class at end. Allows parallel search as long as things match e. g. m*e*t*a*l <N> m*e*t*a <root> metal, meta-language

Kimmo-Based Morphological Parsing • Two-level morphology: lexical level + surface level (Koskenniemi 83) •

Kimmo-Based Morphological Parsing • Two-level morphology: lexical level + surface level (Koskenniemi 83) • Finite-state transducers (FST): input-output pair

Four-Fold View of FSTs • • As a recognizer As a generator As a

Four-Fold View of FSTs • • As a recognizer As a generator As a translator As a set relater

Terminology for Kimmo • • Upper = lexical tape Lower = surface tape Characters

Terminology for Kimmo • • Upper = lexical tape Lower = surface tape Characters correspond to pairs, written a: b If “a=b”, write “a” for shorthand Two-level lexical entries # = word boundary ^ = morpheme boundary Other = “any feasible pair that is not in this tranducer”

Nominal Inflection FST

Nominal Inflection FST

Lexical and Intermediate Tapes

Lexical and Intermediate Tapes

Spelling Rules Name Rule Description Example Consonant Doubling 1 -letter consonant doubled before -ing/-ed

Spelling Rules Name Rule Description Example Consonant Doubling 1 -letter consonant doubled before -ing/-ed beg/begging E-deletion Silent e dropped before -ing and -ed make/making E-insertion e added after s, z, x, ch, sh before s watch/watches Y-replacement -y changes to -ie before -s, -i before -ed try/tries K-insertion verbs ending with vowel + -c add -k panic/panicked

Notation x e --> e / s z ^ __ s #

Notation x e --> e / s z ^ __ s #

Intermediate-to-Surface Transducer

Intermediate-to-Surface Transducer

Two-Level Morphology

Two-Level Morphology

Sample Run

Sample Run

FSTs and ambiguity Parse Example 1: unionizable Parse Example 2: assess

FSTs and ambiguity Parse Example 1: unionizable Parse Example 2: assess

What to do about Global Ambiguity? • Accept first successful structure • Run parser

What to do about Global Ambiguity? • Accept first successful structure • Run parser through all possible paths • Bias the search in some manner

Some Limitations

Some Limitations

Stemming • For some applications, don’t need full morphological analysis. • IR- don’t care

Stemming • For some applications, don’t need full morphological analysis. • IR- don’t care that e. g ‘logician’ is related to ‘logical’ Just want to know that if you are interested in articles about ‘logic’ may want former two classes as well. So just want to ‘get back to root list. • Relate two forms by having a literal relation rule. E. g al#---> 0 • Is it useful: in a big document may not be necessary because the will appear in many forms including form in query

 • stemming is morphologically impoverished so error driven - can’t distinguish rules that

• stemming is morphologically impoverished so error driven - can’t distinguish rules that apply at morpheme boundaries versus internal to root: patronization = patron + ize + ation organization = organize+ ation But the stemmer will treat these as a single class and derive “organ” as an underlying root. -’adverse’/’adversity ‘universe / university

Psycholinguistics • Is the human lexicon efficient in the way computational lexica are? -Stanners

Psycholinguistics • Is the human lexicon efficient in the way computational lexica are? -Stanners et al (1979) : where two words are related inflectionally, then root stored and other forms rule derived. Where there is a derivational relationship, then both forms are stored paradigm = repetition priming ‘great, happy, peachy, adorable , round, short, great small Repetition priming for ‘turns’ given ‘turning’ but not ‘select’, ‘selective’

 • Marslen- Wilson et al (1994): May have priming for Semantically similar derivationally

• Marslen- Wilson et al (1994): May have priming for Semantically similar derivationally related words: permit/permission * create/creativity On-line versus long term storage lexicon: Speech errors: ‘we have screw looses’