Morphology Words and their Parts CS 4705 Basic
Morphology: Words and their Parts CS 4705
Basic Uses of Morphology • The study of how words are composed from smaller, meaning-bearing units (morphemes) • Applications: – Spelling correction: referece – Hyphenation algorithms: refer-ence – Part-of-speech analysis: googler – Text-to-speech: grapheme-to-phoneme conversion • hothouse (/T/ or /D/)
– Speech recognition: phoneme-to-grapheme conversion – Artificial languages in standardized tests • ‘Twas brillig and the slithy toves… • Muggles moogled migwiches
What is a word? • In formal languages, words are arbitrary strings • In natural languages, words are made up of meaningful subunits called morphemes – Allows for productivity: googled, texted – Subword units express concepts denoting entities or relationships in the world • Roots + • Syntactic or grammatical elements – Realizations of morphemes: morphs • Door realizes door; take and took realize take
• Allomorphs are classes of related morphs that realize a given morpheme – Allomorphs of s include en, men, es in English – Take and took are allomorphs of take • Syntactic or grammatical morphemes can convey many things – In Italian, nouns are marked for gender and number Singular Plural Masc pomodoro pomodori Fem cipolla cipolle – pomodor- cipoll- are called stems, which may or may not occur on their own as words – Stem may not occur as a word: derivative/deriv – Base form (lemma) occurs as word: derivative/derive – Sometimes the same: cars has stem ‘car’ and base form or lemma ‘car’ too
What information does morphology give us? • Differs by language – Spanish: hablo, hablaré/ English: I speak, I will speak – English: book, books/ Japanese: hon, hon • Languages also differ in how they encode information – Isolating languages (e. g. Mandarin) have no bound forms (affixes) that attach to a word
– Agglutinative languages (e. g. Finnish, Turkish) are composed of prefixes and suffixes added to a stem like beads on a string – each feature is expressed by a single affix – Inflectional languages (e. g. English) merges different features into a single affix (e. g. person and tense of verbs); same feature can be realized by different affixes – Polysynthetic languges (e. g. Inuit languages) express much of their syntax in their morphology, incorporating a verb’s arguments into the verb, e. g. – So…. different languages may require very different morphological analyzers
Morphology Helps Define Word Classes • AKA morphological classes, parts-of-speech • Closed vs. open (function vs. content) class words – Pronoun, preposition, conjunction, determiner, … – Noun, verb, adjective, … • Identifying word classes is useful for almost any task in NLP, from translation to speech recognition to topic detection…
Inflectional Morphology • Word stem + grammatical morpheme different forms of same word – Usually produces word of same class – Usually serves a syntactic or grammatical function (e. g. agreement) likes or liked birds • Nominal morphology – Plural forms • s or es • Irregular forms (goose/geese)
• Mass vs. count nouns (fish/fish(es), email or emails? ) – Possessives (cat’s, cats’) • Verbal inflection – Main verbs (sleep, like, fear) relatively regular • -s, ing, ed • And productive: emailed, instant-messaged, faxed, homered • But some are not: – eat/ate/eaten, catch/caught – Primary (be, have, do) and modal verbs (can, will, must) often irregular and not productive » Be: am/is/are/were/was/been/being – Irregular verbs few (~250) but frequently occurring
• Particles occur in only one form: in English – Prepositions: to, from – Adverbs: happily, quickly – Conjunctions: but, and – Articles: the, a, an • So…. English inflectional morphology is fairly easy to model…. with some special cases. . .
Derivational Morphology • Word stem + syntactic/grammatical morpheme new words – Usually produces word of different class – Incomplete process: derivational morphs cannot be applied to just any member of a class • Verbs --> nouns – -ize verbs -ation nouns – generalize, realize generalization, realization
• Verbs, nouns adjectives – embrace, pity embraceable, pitiable – care, wit careless, witless • Adjective adverb – happy happily • But process is selective in unpredictable ways – Less productive: nerveless/*evidence-less, malleable/*sleep-able, rar-ity/*rareness – Meanings of derived terms harder to predict by rule • clueless, careless, nerveless, sleepless
• Derivation can be applied recursively: – Hospital hospitalize hospitalization prehospitalization … – Morphological analysis identifies concatenative process as well as morphemes [pre[[[hospital]ize]ation]] – Bracketing paradoxes unhappier [un[happier]: not happier [[unhappy]er]: more unhappy
Compounding • Two base forms join to form a new word – Bedtime, Weinerschnitzel, Rotwein – Careful? Compound or derivation?
Affixes can be attached to stems in different ways – Prefixation • Immaterial – Suffixation: more common across languages than prefixation • Trying – Circumfixation: combine prefixation and suffixation • Gesagt
– Infixation • English: Absobl**dylutely • Bontoc: ‘um’ turns adjectives and nouns into verbs (kilad (red) kumilad (to be red))
Concatenative vs. non-concatenative morphology • Semitic root-and-pattern morphology – Root (2 -4 consonants) conveys basic semantics (e. g. Arabic /ktb/) – Vowel pattern conveys voice and aspect – Derivational template (binyan) identifies word class
Template CVCVC CVCCVC CVVCVC t. VCVVCVC n. CVVCVC Ct. VCVC st. VCCVC Vowel Pattern active passive katab kutib write kattab kuttib cause to write ka: tab ku: tib correspond taka: tab tuku: tibwrite each other nka: tab nku: tib subscribe ktatab ktutib write staktab stuktib dictate
Morphotactics • What are the ‘rules’ for word construction in a language? – pseudointellectual vs. *intellectualpseudo – rationalize vs *izerational – cretinous vs. *cretinly vs. *cretinacious • Possible ‘rules’ – Suffixes are suffixes and prefixes are prefixes – Certain affixes attach to certain types of stems (nouns, verbs, etc. ) – Certain stems can/cannot take certain affixes, e. g.
• Semantics: In English, un- cannot attach to adjectives that already have a negative connotation: – Unhappy vs. *unsad – Unhealthy vs. *unsick – Unclean vs. *undirty • Phonology: In English, -er cannot attach to words of more than two syllables – great, greater – Happy, happier – Competent, *competenter – Elegant, *eleganter – Unruly, unrulier? ?
Morphological Representations: Evidence from Human Performance • Hypotheses: – Full listing hypothesis: words listed – Minimum redundancy hypothesis: morphemes listed • Experimental evidence: – Priming experiments (Does seeing/hearing one word facilitate recognition of another? ) suggest neither – Regularly inflected forms (e. g. cars) prime stem (car) but not derived forms (e. g. management, manage)
– But spoken derived words can prime stems if they are semantically close (e. g. government/govern but not department/depart) • Speech errors suggest affixes must be represented separately in the mental lexicon – ‘easy enoughly’ for ‘easily enough’
Summing Up • Different languages have different morphological systems – If we can discover how to decode such a system, we can identify useful information about the word class and the semantic meaning of a word – Morphological rules provide basis for morphological analyzers (computational morphology) • Next time: – Read Ch 3. 2 -3. 8 (new version)
- Slides: 24