CSA 3050 Natural Language Algorithms Morphological Parsing October
CSA 3050: Natural Language Algorithms Morphological Parsing October 2004 CSA 3050 NLP Algorithms 1
Morphology • Morphemes: The smallest unit in a word that bear some meaning, such as rabbit and s, are called morphemes. • Combination of morphemes to form words that are legal in some language. • Two kinds of morphology – Inflectional – Derivational October 2004 CSA 3050 NLP Algorithms 2
Inflectional/Derivational Morphology • Inflectional +s plural +ed past • category preserving • productive: always applies (esp. new words, e. g. fax) • systematic: same semantic effect October 2004 • Derivational +ment • category changing escape+ment • not completely productive: detractment* • not completely systematic: apartment CSA 3050 NLP Algorithms 3
Noun Inflections Regular Irregular Singular cat church mouse ox Plural cats churches mice oxen October 2004 CSA 3050 NLP Algorithms 4
Morphological Parsing Output Analysis Input Word cats Morphological Parser cat N PL • Output is a string of morphemes • Reversibility? October 2004 CSA 3050 NLP Algorithms 5
Morphological Parsing • The goal of morphological parsing is to find out what morphemes a given word is built from. mouse N SG mice mouse N PL foxes fox N PL October 2004 CSA 3050 NLP Algorithms 6
2 Steps 1. Split word up into its possible components, using + to indicate possible morpheme boundaries. cats cat + s foxes foxe + s 2. Look up the categories of the stems and the meaning of the affixes, using a lexicon of stems and affixes cat + NP + PL fox + s fox + N + PL. October 2004 CSA 3050 NLP Algorithms 7
Step 1: Surface Intermediate FST October 2004 CSA 3050 NLP Algorithms 8
Step 1: Surface Intermediate Operation October 2004 CSA 3050 NLP Algorithms 9
2. Intermediate Morphemes Possible inputs to the transducer are: • • Regular noun stem: Regular noun stem + s: Singular irregular noun stem: Plural irregular noun stem: October 2004 CSA 3050 NLP Algorithms cat+s mouse mice 10
2. Intermediate Morphemes Transducer October 2004 CSA 3050 NLP Algorithms 11
Handling Stems cat /cat mice/mouse October 2004 CSA 3050 NLP Algorithms 12
Completed Stage 2 October 2004 CSA 3050 NLP Algorithms 13
Joining Stages 1 and 2 • If the two transducers run in a cascade (i. e. we let the second transducer run on the output of the first one), we can do a morphological parse of (some) English noun phrases. • We can change also the direction of translation (in translation mode). • This transducer can also be used for generating a surface form from an underlying form. October 2004 CSA 3050 NLP Algorithms 14
Prolog • The transducer specifications we have seen translate easily into Prolog format except for the other transition. • arc(1, 3, z: z). arc(1, 3, s: s). arc(1, 3, x: x). arc(1, 2, #: +). arc(1, 3, <other>). October 2004 CSA 3050 NLP Algorithms 15
Handling other arcs arc(1, 3, z: z) arc(1, 3, s: s) arc(1, 3, x: x) arc(1, 2, #: +) arc(1, 3, X: X) October 2004 : : : - CSA 3050 NLP Algorithms !. !. !. 16
Combining Rules • Consider the word “berries”. • Two rules are involved – berry + s – y → ie under certain circumstances. • Combinations of such rules can be handled in two ways – Cascade, i. e. sequentially – Parallel • Algorithms exist for combining transducers together in series or in parallel. • Such algorithms involve computations over regular relations. October 2004 CSA 3050 NLP Algorithms 17
3 Related Frameworks REGULAR LANGUAGES REGULAR EXPRESSIONS October 2004 FSA CSA 3050 NLP Algorithms 18
REGULAR RELATIONS AUGMENTED REGULAR EXPRESSIONS October 2004 FINITE STATE TRANSDUCERS CSA 3050 NLP Algorithms 19
Putting it all together execution of FSTi takes place in parallel October 2004 CSA 3050 NLP Algorithms 20
Kaplan and Kay The Xerox View FSTi are aligned but separate October 2004 FSTi intersected together CSA 3050 NLP Algorithms 21
Summary • Morphological processing can be handled by finite state machinery • Finite State Transducers are formally very similar to Finite State Automata. • They are formally equivalent to regular relations, i. e. sets of pairings of sentences of regular languages. October 2004 CSA 3050 NLP Algorithms 22
- Slides: 22