A Lexical Theory of Variation Andries W Coetzee

  • Slides: 34
Download presentation
A Lexical Theory of Variation Andries W. Coetzee Workshop on Variation, Gradience and Frequency

A Lexical Theory of Variation Andries W. Coetzee Workshop on Variation, Gradience and Frequency in Phonology Stanford University, July 2007

Things that are known to influence variation Ø Grammar (i) Where: Where it appears

Things that are known to influence variation Ø Grammar (i) Where: Where it appears and where not (ii) Frequency: How often does a process apply in some context Ø Lexical frequency Some variable processes affect frequent words more, others affect infrequent words more. Ø Extra-grammatical factors Speech style, speech rate, etc.

Existing theories of variation Ø Grammatical • Variable rule in the Labovian tradition (Labov

Existing theories of variation Ø Grammatical • Variable rule in the Labovian tradition (Labov 1972; Sankoff 1988) • Several OT models (Anttila 1997; Boersma and Hayes 2000; Coetzee 2006; Reynolds 1994) Reasonably successful at accounting for the grammatical influence. Ø Usage-based/exemplar models (Bybee 2001, 2002; Pierrehumbert 2001) Reasonably successful at accounting for the influence of lexical frequency. Ø Interaction between the two Models that incorporate both are still largely absent.

Structure of the presentation 1. Usage frequency and variation 2. The basics of the

Structure of the presentation 1. Usage frequency and variation 2. The basics of the proposal 3. Phonetically motivated variation 4. Analogically motivated variation 5. Learning lexical distributions

Usage Frequency and Variation

Usage Frequency and Variation

Phonetically motivated variable process Ø Typical phonological process Ø Applies more often to lexical

Phonetically motivated variable process Ø Typical phonological process Ø Applies more often to lexical items with higher usage frequency Ø Example: t/d deletion Pre-C: west bank ~ wes bank Pre-V: west end ~ wes end Pre-##: west ~ wes Chicano English (Santa Ana 1991) Influence of frequency (Bybee 2000: 70) Pre-C Pre-V Pre-## n 3, 693 1, 574 1, 024 % deleted 62 45 37 High frequency Low frequency n 1, 650 399 % deleted 54. 4 34. 4

Analogically motivated variable process Ø Usually some kind of regularization process – irregular plural/past

Analogically motivated variable process Ø Usually some kind of regularization process – irregular plural/past tense replaced with regular Ø Applies more often to lexical items with lower usage frequency Ø Example: Regularization of past tense verbs Infrequent verbs are more likely to regularize (Hooper 1976: 100; Bybee 1985: 120, 2002: 269; Bybee & Slobin 1982) Less likely to regularize More likely to regularize Present Raw Log keep 348 2. 54 creep 19 1. 28 leave 345 2. 54 leap 20 1. 30 sleep 106 2. 03 weep 22 1. 34 174 2. 24 dive 32 1. 93 drive Kučera and Francis frequencies (1982) as calculated at www. iphod. com. Ø Also many examples from the historical literature. (Phillips 1984, 2001 and references therein. )

The challenge A formal theory of variation that: Ø Captures the role of grammar

The challenge A formal theory of variation that: Ø Captures the role of grammar • Determines what kind of variation is possible • Influences the frequency of application Ø Captures the role of lexical frequency • Variable process applies differently to different lexical items. • Different kinds of processes are differently influenced by lexical frequency.

The Proposal: Variation Through Lexical Indexation

The Proposal: Variation Through Lexical Indexation

Variable lexical indexation Ø Ø Lexically indexed constraints (Pater 1994, 2000; Itô & Mester

Variable lexical indexation Ø Ø Lexically indexed constraints (Pater 1994, 2000; Itô & Mester 1995, 1999) • Allows a way in for lexical influence • Yet still keep control in the hands of grammar Variation through variable lexical class affiliation MAX-L 2 /west/L 2 west wes /west/L 1 west wes Ø M MAX-L 1 * *! *! * Note that the grammar stays constant – what varies is the lexical class affiliation of lexical items. Variation is hence moved from the grammar into the lexicon.

Lexical distribution functions Ø What determines the lexical class affiliation of a lexical item?

Lexical distribution functions Ø What determines the lexical class affiliation of a lexical item? Ø Each lexical item is stored with a probability density function. • Every time a lexical item is submitted to grammar for evaluation, a value is chosen randomly along the x-axis of the distribution function. • The x-axis is divided into equally sized adjacent regions corresponding to the number of indexed versions of the constraint. • Correlation between frequency and skewness of distribution function: – Frequent lexical items = left skewed function – Infrequent lexical items = right skewed function average low high L 2 L 1

Example 1: Phonetically Motivated Variation

Example 1: Phonetically Motivated Variation

t/d-deletion again Context Frequency Pre-C Pre-V Pre-## n 3, 693 1, 574 1, 024

t/d-deletion again Context Frequency Pre-C Pre-V Pre-## n 3, 693 1, 574 1, 024 % deleted 62 45 37 High frequency Low frequency n 1, 650 399 % deleted 54. 4 34. 4 Grammar Ø Markedness constraints *PRE-C No t/d in the context C_#C *PRE-V No t/d in the context C_#V *PRE-## No t/d in the context C_## Contextual licensing constraints a la Steriade (1997) Ø Four indexed versions of MAX. Ø Ranking: MAX-L 4 *PRE-C MAX-L 3 *PRE-V MAX-L 2 *PRE-## MAX-L 1

The grammar in Pre-C condition Preservation if MAX-L 4, deletion if MAX-L 3, MAX-L

The grammar in Pre-C condition Preservation if MAX-L 4, deletion if MAX-L 3, MAX-L 2, MAX-L 1 MAX-L 4 /west. L 4 bank/ west bank wes bank /west. L 3 bank/ west bank /west. L 2 bank/ /west. L 1 bank/ wes bank *PRE-V MAX-L 2 *PRE-## MAX-L 1 *! *! * *! wes bank west bank MAX-L 3 * wes bank west bank *PRE-C * *! *

The grammar in Pre-V condition Preservation if MAX-L 4, MAX-L 3, deletion if MAX-L

The grammar in Pre-V condition Preservation if MAX-L 4, MAX-L 3, deletion if MAX-L 2, MAX-L 1 MAX-L 4 /west. L 4 end/ west end /west. L 1 end/ wes end MAX-L 2 *PRE-## MAX-L 1 *! *! wes end west end *PRE-V * west end wes end /west. L 2 end/ MAX-L 3 west end wes end /west. L 3 end/ *PRE-C * *! *

The grammar in Pre-Pause condition Preservation if MAX-L 4, MAX-L 3, MAX-L 2, deletion

The grammar in Pre-Pause condition Preservation if MAX-L 4, MAX-L 3, MAX-L 2, deletion if MAX-L 1 MAX-L 4 /west. L 4/ west MAX-L 2 wes *PRE-## MAX-L 1 * *! west wes /west. L 1/ *PRE-V west wes /west. L 2/ MAX-L 3 west wes /west. L 3/ *PRE-C * *! *! *

Likelihood of deletion based on grammar alone Grammar: MAX-L 4 *PRE-C MAX-L 3 *PRE-V

Likelihood of deletion based on grammar alone Grammar: MAX-L 4 *PRE-C MAX-L 3 *PRE-V MAX-L 2 *PRE-## MAX-L 1 Context Example Indexation resulting in retention Indexation resulting in deletion % indexations resulting in deletion Pre-C west side L 4 L 3, L 2, L 1 75% Pre-V west end L 4, L 3 L 2, L 1 50% Pre-Pause west L 4, L 3, L 2 L 1 25% Note that grammar determines: Ø What variation is observed – only a process that reduces markedness, only a process that is grammatically motivated. Ø How frequently process applies in which context. But we still need to give the lexicon its due.

The influence of lexical frequency Raw frequency Log frequency Expected deletion Infrequent vest 6

The influence of lexical frequency Raw frequency Log frequency Expected deletion Infrequent vest 6 0. 60 Low Intermediate modest 29 1. 46 Medium Frequent best 361 2. 56 High 29. 8 1. 47 Mean Frequencies from Francis & Kučera (1982), calculated at www. iphod. com. vest modest MAX-L 4 MAX-L 3 MAX-L 2 MAX-L 1 *PRE-C *PRE-V *PRE-## best

Example 2: Analogically Motivated Variation

Example 2: Analogically Motivated Variation

Regularization of the strong past tense in English Ø Specific examples from Kučera and

Regularization of the strong past tense in English Ø Specific examples from Kučera and Francis (1982) (www. iphod. com) Base Ø Raw Log Regular past Strong past % regular speed 91 1. 96 3 9 25 dive 32 1. 51 5 4 56 leap 20 1. 2 20 2 91 mean 29. 8 1. 47 Irregular morphology/suppletion as allomorphy • Two morphological options formation of the past tense. • Both options are input to grammar, so that choice of the one allomorph does not violate faithfulness relative to the other. (Anttila 1997, Bonet 2004, Itô and Mester 2006, Kager 1996, Mascaró 1996, etc. ) Ø Constraints • OO-FAITH: Some kind of paradigm uniformity (Benua 2000, Kenstowicz 1996, etc. ) • USELISTED: The input of a candidate must be a single lexical entry (Zuraw 2000)

The grammar OO-FAITH-L 2 {/leap. L 1 + ed/, /leapt. L 1/} leaped OO-Base:

The grammar OO-FAITH-L 2 {/leap. L 1 + ed/, /leapt. L 1/} leaped OO-Base: leapt {/leap. L 2 + ed/, /leapt. L 2/} leaped OO-Base: leapt USELISTED *! * * *! And the influence from the lexicon leap dive speed OO-FAITH-L 2 OO-FAITHL 1 USELISTED OO-FAITH-L 1

Lexical Distribution Functions

Lexical Distribution Functions

What needs to be learned? Grammar : Ranking between constraints Lexicon : Lexical items,

What needs to be learned? Grammar : Ranking between constraints Lexicon : Lexical items, with their probabilistic distribution functions. These are two separate learning problems, each with their own solution. Learning the grammar Well developed learnability literature in OT. (Tesar and Smolensky 1998, 2000, etc. ) And specifically on learning an indexed grammar. (Pater 2006, to appear). I will therefore not dwell on this aspect here. Learning the lexicon Focus here on how the lexical distribution functions might be acquired.

General properties of lexical distribution functions infrequent MAX: DEP: average frequent L 1 L

General properties of lexical distribution functions infrequent MAX: DEP: average frequent L 1 L 2 L 3 IDENT[F]: L 1 L 2 L 3 L 4

General properties of lexical distribution functions infrequent average Basic requirements Ø Minimum and maximum

General properties of lexical distribution functions infrequent average Basic requirements Ø Minimum and maximum value. Ø Shape parameters that determine skewness Beta-distribution Ø = Ø < Ø > (Evans, Hastings & Peacock 2000) symmetric right skewed left skewed frequent

A small scale simulation Ø IPh. OD 1. 3 (www. iphod. com) • 33,

A small scale simulation Ø IPh. OD 1. 3 (www. iphod. com) • 33, 432 words, with CMU transcriptions and Kučera~Francis frequencies • Multiple KF by 10 to avoid having to work with log(1) … Ø Calculated the following • Mean frequency of all words in IPh. OD = 297. 89. Log( ) = 2. 47. • Collected all words that end [-Ct] or [-Cd], excluding past tense verbs, and took the log of the frequency for each of these. Ø Distribution functions: Frequency Skewness frequent (f > ) left ( > ) log(f) log( ) infrequent (f < ) right ( < ) log(f)

A small scale simulation aghast vest modest best most Frequency 10 40 290 3610

A small scale simulation aghast vest modest best most Frequency 10 40 290 3610 11610 297. 89 Log 1 1. 60 2. 46 3. 56 4. 07 2. 47 aghast modest best vest MAX-L 4 MAX-L 3 MAX-L 2 MAX-L 1 *PRE-C *PRE-V *PRE-##

How well do the predictions line up with reality? Ø Once the values of

How well do the predictions line up with reality? Ø Once the values of and for a word are known, it is easy to calculate the likelihood of an x-value falling in a specific range along the x-axis, and hence the likelihood of deletion in each of the three contexts for each word. Ø Using this, I ran a simulation, feeding each [-Ct] and [-Cd] word through the grammar, according to its frequency in IPh. OD. Phonological context (value in brackets is ratio to Pre-C) (Santa Ana 1991) Pre-C Pre-V Pre-## Chicano English 62 45 (. 73) 37 (. 60) Predictions of LTV 90 62 (. 69) 27 (. 30) Frequency (value in brackets is ratio to > 35/million) (Bybee 2000) > 35/million < 35/million Chicano English 54 34 (. 63) Predictions of LTV 65 43 (. 66)

How can this be refined further? Ø Currently, the lexical distribution functions are determined

How can this be refined further? Ø Currently, the lexical distribution functions are determined purely based on lexical frequency. But we know that different dialects show different deletion rates. • Either different dialects have different lexical frequencies. • Or there are other parameters that can be set independently from lexical frequency. Ø Maybe some constant is added/subtracted from the mean? • Added = more words become “infrequent” = more conservative dialect. • Subtracted = more words become “frequent” = more deletion. Ø Maybe the lexical space can be warped – i. e. the regions along the x-axis that correspond to lexical classes are not of equal size. Ø Maybe lexical distribution functions are best-fit functions – i. e. learn a function that would result in the correct deletion rate … but then we lose the connection between usage frequency and deletion rates.

Conclusion

Conclusion

Conclusion Ø Ø Ø Existing grammatical models of variation do not allow the lexicon

Conclusion Ø Ø Ø Existing grammatical models of variation do not allow the lexicon enough opportunity to play a role. (Pierrehumbert 2001): p. 138 A second challenge arises from the fact that the differential phonetic outcomes relate specifically to word frequency. Standard generative models do not encode word frequency. They treat the word frequency effects … as matters of linguistic performance rather than linguistic competence. Thus the intrusion of word frequency into a traditional area of linguistics, namely to conditioning of allophony, is not readily accommodated in the classical generative viewpoint. p. 148 The exemplar model is the only current model which has these properties. Purely usage-based models probably does not allow the grammar enough say. Bybee (2000: 73) … it does mean that there is no variable rule of t/d-deletion. Rathere is a gradual process of shortening or reducing the lingual gesture … Bybee (2002: 268) If we take linguistic behavior to be highly practiced neuromotor activity … then we can view reductive sound changes as the result of the automation of linguistic production. It is well known that repeated neuromotor patterns become more efficient as they are practiced; transitions are smoothed by the anticipatory overlap of gestures, and unnecessary or extreme gestures decrease in magnitude or are omitted. LTV is an attempt to do both. Does it succeed?

References Anttila, Arto. 1997. Deriving variation from grammar. In Frans Hinskens, Roeland van Hout

References Anttila, Arto. 1997. Deriving variation from grammar. In Frans Hinskens, Roeland van Hout and Leo Wetzels, eds. Variation, Change and Phonological Theory, Amsterdam: John Benjamins. p. 35 -68. Benua, Laura. 2000. Phonological Relations Between Words. New York: Garland. Boersma, Paul and Bruce Hayes. 2000. Empirical tests of the Gradual Learning Algorithm. Linguistic Inquiry, 32: 45 -86. Bonet, Eulàlia. 2004. Morph insertion and allomorphy in Optimality Theory. International Journal of English Studies , 4: 73 -104. Bybee, Joan L. 1985. Morphology: A Study of the Relation Between Meaning and Form. Amsterdam: Benjamins. Bybee, Joan L. 2000. The phonology of the lexicon: evidence from lexical diffusion. In Michael Barlow and Suzanne Kemmer, eds. Usage-Based Models of Language. Stanford: CSLI Publications. p. 65 -85. Bybee, Joan. 2001. Phonology and Language Use. Cambridge: Cambridge University Press. Bybee, Joan. 2002. Word frequency and context of use in the lexical diffusion of phonetically conditioned sound change. Language Variation and Change , 14: 261 -290. Bybee, Joan L. and Dan I. Slobin. 1982. Rule and schemas in the development and use of the English past tense. Language, 58: 265 -289. Coetzee, Andries W. 2006. Variation as accessing “nonoptimal” candidates. Phonology, 23: 337 -385. Itô, Junko and Armin Mester. 1995. The core-periphery structure of the lexicon and constraints on reranking. In J. Beckman, S. Urbanczyk, and L. Walsh, eds. University of Massachusetts Occasional Papers in Linguistics 18: Papers in Optimality Theory, Amherst: GLSA. p. 181 -209. Itô, Junko and Armin Mester. 1999. The structure of the phonological lexicon. In Tsujimura Natsuko, ed. The Handbook of Japanese Linguistics. Malden: Blackwell. p. 62100. Itô, Junko and Armin Mester. 2006. Indulgentia parentum filiorum pernicies: Lexical allomorphy in Latin and Japanese. In Eric Bakovic, Junko Ito, and John Mc. Carthy, eds. Wondering at the Natural Fecundity of Things: Essays in Honor of Alan Prince. Paper 9. (http: //repositories. cdlib. org/lrc/prince/9). Hooper, Joan B. 1976. Word frequency in lexical diffusion and the source of morphological change. In William M. Christie, ed. Current Progress in Historical Linguistics. Amsterdam: North. Holland Publishing Co. p. 95 -105. Kager, René. 1996. On affix allomorphy and syllable counting. In Ursula Kleinhenz, ed. Interfaces in Phonology. Berlin: Akademie Verlag. p. 155 -171. Kenstowicz, Michael. 1996. Base-identity and uniform exponence: alternatives to cyclicity. In Current Trends in Phonology: Models and methods. In J. Durand B. Laks, eds. Paris-X and Salford: University of Salford Publications. p. 363 -393 Labov, William. 1972. The internal evolution of linguistic rules. In Robert P. Stockwell and Ronald K. S. Maucaulay, eds. Linguistic Change and Generative Theory. Bloomington: Indiana University Press. p. 101 -171. Mascaró, Joan. 1996. External allomorphy as emergence of the unmarked. In Jacques Durand Bernard Laks, eds. Current Trends in Phonology: Models and Methods. Salford, Manchester: European Studies Research Institute, University of Salford. pp. 473 -83.

References Pater, Joe. 1994. Against the underlying specification of an ‘exceptional’ English stress pattern.

References Pater, Joe. 1994. Against the underlying specification of an ‘exceptional’ English stress pattern. Toronto Working Papers in Linguistics, 13: 95 -121. Pater, Joe. 2000. Non-uniformity in English secondary stress: the role of ranked and lexically specific constraints. Phonology, 17: 237 -274. Pater, Joe. 2006. The Locus of Exceptionality: Morpheme. Specific Phonology as Constraint Indexation. In L. Bateman, M. O'Keefe, E. Reilly, and A. Werle, eds. University of Massachusetts Occasional Papers in Linguistics 32: Papers in Optimality Theory III. Amherst: GLSA. p. 259 -296. Pater, Joe. to appear. Morpheme-specific phonology: constraint indexation and inconsistency resolution. In Steve Parker, ed. Phonological Argumentation. London: Equinox Publishers. Phillips, Betty S. 1984. Word frequency and the actuation of sound change. Language, 60: 320 -342. Phillips, Betty S. 2001. Lexical diffusion, lexical frequency, and lexical analysis. In Joan Bybee and Paul Hopper, eds. Frequency and the Emergence of Linguistic Structure. Amsterdam: John Benjamins. p. 123 -136. Pierrehumbert, Janet. 2001. Exemplar dynamics: Word frequency, lenition, and contrast. In Joan Bybee and Paul Hopper, eds. Frequency Effects and the Emergence of Lexical Structure. Amsterdam: John Benjamins. p. 137 -157. Reynolds, Bill. 1994. Variation and Phonological Theory. Ph. D. dissertation, University of Pennsylvania. Sankoff, David. 1988. Variable rules. In Ulrich Ammon, Norbert Dittmar and Klaus J. Mattheier, eds. Sociolinguistics: An International Handbook of the Science of Language and Society. Berlin & New York: Walter de Gruyter. p. 984 -997. Santa Ana, Otto. 1991. Phonetic Simplification Processes in the English of the Barrio: A Cross-Generational Sociolinguistic Study of the Chicanos of Los Angeles. Ph. D. Dissertation, University of Pennsylvania. Steriade, Donca. 1997. Phonetics in Phonology: The Case of Laryngeal Neutralization. Ms. UCLA. Tesar, Bruce, & Paul Smolensky. 1998. Learnability in Optimality Theory. Linguistic Inquiry, 29: 229 -268. Tesar, Bruce, & Paul Smolensky. 2000. Learnability in Optimality Theory. Cambridge, MA: MIT Press. Zuraw, Kie. 2000. Patterned Exceptions in Phonology. Ph. D. dissertation, UCLA.

Die einde

Die einde