Morphology FiniteState Transducers n n Morphology the study

  • Slides: 10
Download presentation
Morphology & Finite-State Transducers n n Morphology: the study of constituents of words Word

Morphology & Finite-State Transducers n n Morphology: the study of constituents of words Word = {a set of morphemes, combined in language-dependent ways} u u n Classes of Morphemes u u n stem (root) affixes (詞綴) Morphological Parsing (or Analysis): u u u n morpheme: small meaning bearing unit e. g. , books = book+s, cats = cat + s breaking down surface forms (or input forms) into stem and affixes e. g. , foxes = “fox” + “-es” (+N, +PL) stemming: mapping surface form to stem (extracting stem from surface form) Morphological Generation: u generate surface forms from stem and morphological features Jing-Shin Chang 1

Morphology & Finite-State Transducers n Applications: u n Knowledge for Morphological Analysis u u

Morphology & Finite-State Transducers n Applications: u n Knowledge for Morphological Analysis u u n spelling check, tokenization for parsing morphological rules (morphotactics): constituents of words & order spelling rules (orthographic rules): spelling changes Dictionary/Lexicon: u u u list of stems and affixes stems of regular words (plus irregular variants) as indexing keys not efficient to enumerate all morphological variants F F F some morphemes are productive: can be applied to all words or new words (impossible to list all of them) morphological variants depends on spelling as well as pronunciation morphologically complex languages (e. g. , Turkish) may have a large number of morphological variants Jing-Shin Chang 2

Morphology & Finite-State Transducers n Models for morphological analysis/generation u u generate-and-test: enumerate all

Morphology & Finite-State Transducers n Models for morphological analysis/generation u u generate-and-test: enumerate all possibilities & test against constraints FSA / two-level FST model: modeling lexicon, morphological rules and orthographic rules as finite state automata or transducers Jing-Shin Chang 3

English Morphology n Morphology: u u n Classes of Morphemes u u n the

English Morphology n Morphology: u u n Classes of Morphemes u u n the study of the way words are built up from smaller meaning-bearing units (morphemes) morpheme: the minimal meaning-bearing unit in a language stem (root): main morpheme of the word, supplying main meaning affixes (詞綴): additional meanings Affixes: u u u prefixes: un-happy suffixes: eat-s infixes: inserted inside the stem F u Philipine language Tagalog: hingi (“borrow”) => h-um-ingi (agent of borrow) circumfixes: F sagen (“to say”) => ge-sag-t (“said”) (German) [pp] Jing-Shin Chang 4

English Morphology n Affixes: u u n Templatic: root-and-pattern u u u n n

English Morphology n Affixes: u u n Templatic: root-and-pattern u u u n n concatenative: prefix & suffixes non-concatenative: infixes & templatic morphology Arabic, Hebrew, Semitic languages Hebrew: lmd (“learn”, “study”) (tri-consonantal root) active voice template: Ca. C => lamad (‘he studied’) intensive Ci. Ce. C template: => limed (‘he taught’) intensive passive template Cu. Ca. C => lumad (‘he was taught’) Multiple affixes: un-believabl-y Agglutinative languages: u languages that tends to string affixes together (Turkish, Japanese, Korean) Jing-Shin Chang 5

English Morphology n Infection: u u n stem + morphemes => same class e.

English Morphology n Infection: u u n stem + morphemes => same class e. g. , book + s => books (same meaning, same part of speech(詞類)) Derivation: u u stem + morphemes => different class e. g. , computerize + ation => computerization [verb => noun] Jing-Shin Chang 6

English Morphology n Inflectional Morphology u n Noun: Plural, Possessive u u n Regular:

English Morphology n Inflectional Morphology u n Noun: Plural, Possessive u u n Regular: Plural (+s/+es/+ies), Possessive (+’s, +s’) Irregular: ox-en, mouse => mice Verb (main/一般, modal/助, primary/be): u u u n only Noun, Verb, Adjective, Adverb can be inflected Forms: stem (現/不定), -s (現/P 3 SG), -ing(動名/現分), -ed (過/過分/完成) Regular: (+s/+es, -y+ies), -e+ing/+. ing (consonant doubling), +d/+ed/+. ed Irregular: e. g. , eat => ate, eaten (+en), catch => caught Consonant doubling: (短母音)+單子音 => double -c => -ck (picnicked) Adjective/Adverb: comparative/extreme u happy => happier, happiest, happily Jing-Shin Chang 7

English Morphology n Derivational Morphology u u n Nominalization: V/A => N u u

English Morphology n Derivational Morphology u u n Nominalization: V/A => N u u n usually resulting in different classes need part of speech (POS) conversion from root POS & affixes to get correct POS computerize => computerization more examples … N/V => A u u computation => computational more examples … Jing-Shin Chang 8

Chinese Morphology n Chinese Morphemes u u u n hard to be distinguished from

Chinese Morphology n Chinese Morphemes u u u n hard to be distinguished from characters and words and compound words free morphemes bound morphemes Examples u u u 副-總統, 前-妻, 非-經濟(因素) 學生-們 哈日-族, 銀髮-族 業-化, 綠-化, 藍-化, 腐-化, 石-化, 神-化 公務-員, 業務-員, 推銷-員, 運動-員 Jing-Shin Chang 9

Jing-Shin Chang 10

Jing-Shin Chang 10