Sounds and Prosodies in Communicative Phonetic Science Klaus

  • Slides: 42
Download presentation
Sounds and Prosodies in Communicative Phonetic Science Klaus J. Kohler University of Kiel, Germany

Sounds and Prosodies in Communicative Phonetic Science Klaus J. Kohler University of Kiel, Germany Paper presented at Symposium in Honour of Hans Basbøll Odense, 20 August, 2013

1 From Sound to Phoneme • For thousands of years, homo sapiens loquens has

1 From Sound to Phoneme • For thousands of years, homo sapiens loquens has invented ways of capturing the fleeting sound of spoken words in timeless symbols on durable material. • The aim of all the systematic writing systems that have resulted is to represent lexical items in graphic form – either ideographically, or with reference to sound units in syllabic or alphabetic scripts – An alphabetic writing system has been invented only once, in the Semitic language family. – All other alphabetic systems are derivatives from it. • Why should that be so?

 • 3 -consonant roots for semantic fields of the lexicon k'atab he wrote

• 3 -consonant roots for semantic fields of the lexicon k'atab he wrote y'iktib he writes, will write k'aatib clerk k'ataba clerks kit'aab book k'utub books makt'uub written m'aktab office, desk makt'aba library

 • This was the birth of the “phonemic” principle in tight association of

• This was the birth of the “phonemic” principle in tight association of lexical meaning and form. • No other language had this, so no other language developed an indigenous alphabetic script. • When the phoneticians of the newly-founded IPA at the end of the 19 th c. devised a phonetic alphabet to indicate pronunciation in languages like English or French, whose Latin orthographies had become deficient in the representation of sounds, they reinvented the phonemic principle – broad and narrow transcription

 • The linguists of the Prague Circle turned this into a phonological theory

• The linguists of the Prague Circle turned this into a phonological theory with the distinctive phoneme for the differentiation of the intellectual meaning of words, and allophonic variation in context. – They kept the function-form link – but dissociated it from graphic representation – and turned it into a principle of sound structures – every language having its own phonemic system • The American Structuralists, in their behaviouristic philosophy, went one step further and removed the link to meaning, being unable to formalize it.

 • Grouping of sounds into phonemes now governed by – complementary distribution –

• Grouping of sounds into phonemes now governed by – complementary distribution – phonetic similarity • But Pike still recognised the original “phonemic principle” because he gave his book Phonemics the subtitle “A technique for reducing languages to writing”. • After that , “phonology” became a separate discipline and had a metalinguistic purpose in itself practised by desk phonologists.

 • Generative Phonology, Optimality Theory, Markedness, Feature Hierarchy • Phonological categories were moved

• Generative Phonology, Optimality Theory, Markedness, Feature Hierarchy • Phonological categories were moved again from behaviouristic groupings to entities in the ideal speaker/listener’s mind. • At this point, psycholinguists got hold of them and started taking them into the lab for experiments on “the phoneme as a perceptual” unit. – This has been the MPI Nijmegen paradigm for the past 20 years, e. g. in phoneme spotting. – But is this extrapolation justified?

2 From Phoneme to Fine Phonetic Detail • Pronunciation“white please” vs. “black please” ordering

2 From Phoneme to Fine Phonetic Detail • Pronunciation“white please” vs. “black please” ordering coffee – [w. A> «? pli: z] by a Londoner – mistaken for [bl³A>k pli: z] by a Scottish listener – expecting [ãÃi? pli: z]. • In this situational context, the listener‘s task was to understand one of two possible meanings – wrong understanding triggered by “graveness” instead of“acuteness” of the sound – not by wrong phoneme perception.

 • Listeners process speech signals with perceptual categories shaped by attention and memory,

• Listeners process speech signals with perceptual categories shaped by attention and memory, not by abstraction from sound to phoneme – they aim at understanding messages in all their facets of meaning, even from incomplete “segmental” signal information – stable multidimensional fine phonetic detail plays an important role – based on episodic memory, exemplar recognition and contextual information • This is mandatory in the processing of reduced speech, especially of function word form variability.

 • Here is an example from the Kiel Corpus of Spontaneous Speech: OLV

• Here is an example from the Kiel Corpus of Spontaneous Speech: OLV g 122 a 009 • I shall first play a stretch of speech that even native speakers of German will not be able to understand, which phoneticians find very difficult to represent as a string of segments, and German phoneticians as a sequence of phonemes. • Then I shall add the next stretch which will most likely trigger understanding of both stretches. • A third stretch will complete understanding.

n oâù â n. VŒ)(M a k H U N 0

n oâù â n. VŒ)(M a k H U N 0

 0 m I t H v 8 x f Ò 8 ai I

0 m I t H v 8 x f Ò 8 ai I s nun wollen wir mal kucken, ob Mittwoch frei ist /nuùn v l( «)n v. IŒ maùl k. Uk( «)n ? p m. Itv x frai Ist/

n uù n v l « nv IŒ m aù l k H U

n uù n v l « nv IŒ m aù l k H U k N

 • [k. HU N 0] is identified as the verb <kucken>. • The

• [k. HU N 0] is identified as the verb <kucken>. • The sound stretch that immediately precedes must be the modal particle <mal>, which commonly occurs in verbal context as [ma]. • But then an inflected auxiliary verb must precede. • The dark vocalic stretch ending in a labiodentalized nasal, which is in turn followed by [Œ], can be associated with <wollen wir>, because it commonly reduces in the direction of [V n. VŒ]. <werden, sollen, müssen> do not fit.

 • The initial stretch of [n] + dark vowel with strong nasalization across

• The initial stretch of [n] + dark vowel with strong nasalization across the long vocalic section can be associated with <nun>. • The result is an understanding of what in English is <“Now let’s see if Wednesday is free. ”>. • This theoretical account of how the highly reduced utterance may be recognised puts sound perception into an integrated framework of cognitive processing for the understanding of meaning. – Phonemes and canonical forms play no role in it.

– Phonetic traces, which need not be segmental but may be spread over indefinite

– Phonetic traces, which need not be segmental but may be spread over indefinite stretches (articulatory prosodies), trigger the recognition process. – Such articulatory prosodies are ° nasalization ° glottalization ° labialization, labiodentalization ° palatalization, velarization, pharyngealization

– These phonetic traces work in conjunction with morphological, syntactic and situational constraints –

– These phonetic traces work in conjunction with morphological, syntactic and situational constraints – memory of multiple phonetic forms of lexical items is essential – complete phonetic identification of acoustic sequences is not required

 • The spontaneous speech sample provides further interesting data – signalling boundaries °

• The spontaneous speech sample provides further interesting data – signalling boundaries ° phrase boundaries: < mal kucken, ob> ° word boundaries: <frei ist> ° in both cases canonical phonology has [? ] – signalling articulatory breaks for stops in nasal environments: [k. HU k. N], [ pm. Itv x]

k H U N 0 0 m I

k H U N 0 0 m I

 • The junctions between the words <kuck(e)n>, <ob>, <Mittwoch> have an overlay of

• The junctions between the words <kuck(e)n>, <ob>, <Mittwoch> have an overlay of continuous glottalization, with nasalization through the stops [N 0 0 m 0] – glottalization in the nasal provides a phonatory break to signal stop + nasal ° do Danish listeners perceive stød? – and glottalization in the vowel is a phrase boundary break mark between <kucken> and <ob>

f Ò 8 a i I s

f Ò 8 a i I s

 • The word boundary between <frei> and <ist> is neither marked by [?

• The word boundary between <frei> and <ist> is neither marked by [? ] nor by glottalization – but by a dip in f 0 and energy – heightened by vocalic duration – Do Danish listeners perceive stød? • The word boundary may be – strengthened by introducing glottalization – or weakened by removing the f 0/energy dip – only leaving vowel length to mark bisyllabicity

k. HU g* N bm. I t v 8 x f Ò 8 a.

k. HU g* N bm. I t v 8 x f Ò 8 a. I I 0 s

k. HU g* N bm. I t v 8 x f Ò 8 a.

k. HU g* N bm. I t v 8 x f Ò 8 a. I I s

 • There is another German example of a word boundary that is signalled

• There is another German example of a word boundary that is signalled by prosody rather than [? ], which one can hear all the time around Easter – Frohe Ostern [fÒoùoùstŒn] “Happy Easter” – Does it concide with [fÒoùstŒn] ? • Here are naturally produced – Frohe Ostern [fÒoùoùstŒn] – and the non-word *Frohstern [fÒoùstŒn] – they differ in vowel duration, f 0 and energy timing

Frohe Ostern *Frohstern

Frohe Ostern *Frohstern

 • Frohe Ost- is bisyllabic: the low f 0 precurser is perceived as

• Frohe Ost- is bisyllabic: the low f 0 precurser is perceived as the prehead to the rise-fall. • *Frohst-is monosyllabic: there is a unitary rise-fall. • Shortening Frohe Ost- to the vowel duration of *Frohst- squeezes the pitch pattern into a monosyllabic slot with a late peak pattern. • Lengthening *Frohst- creates an oscillating pattern of bisyllabic and excessively long monosyllabic. • We can now lengthen original *Frohst- (x 1. 4) and shorten original Frohe Ost- (x 0. 7) to the same value in between the two original vowel durations. • Then f 0 and energy timing are manipulated.

orig. Frohe Ostern x 0. 7 smoothed f 0 smoothed energy smoothed f 0

orig. Frohe Ostern x 0. 7 smoothed f 0 smoothed energy smoothed f 0 + energy

orig. Frohstern x 1. 4 dipped f 0 dipped energy dipped f 0 +

orig. Frohstern x 1. 4 dipped f 0 dipped energy dipped f 0 + energy

 • The variability between presence of glottalization and dips in prosodic parameters for

• The variability between presence of glottalization and dips in prosodic parameters for word boundary marking is reminiscent of what is found in the broad scale of stød realization in Danish. l æ s er (reads) (reader)

 • The two stød realizations have in common an abrupt fall of f

• The two stød realizations have in common an abrupt fall of f 0 and energy in the vowel – comparable to the dip in the German word boundary marking – and different from the smooth f 0/energy timing in the stødless word form. • In both German and Danish we thus find the use of phonetic features to signal a break vs. smooth transition in articulation – but the function is, of course, different – non-tonal phrase prosody to mark boundaries vs. a non-tonal syllable prosody to mark lexical class.

 • Another language area that can be added to the discussion of this

• Another language area that can be added to the discussion of this break prosody comprises the Frankish dialects in the border districts between Germany, Holland Belgium, known as “Rheinische Schärfung”, e. g. “Nase” (nose) vs. “nass” (wet) in Cologne. • Dealing with these data in terms of phonemes, tones and canonical forms misses insights into production and perception across languages.

3 From Auditory Observation to Signal Analysis • The technological advance in speech signal

3 From Auditory Observation to Signal Analysis • The technological advance in speech signal analysis, initially the spectrograph, now computer programs, – inevitably led to taking the phoneme concept into the lab – in order to substantiate phonological entities and structures by objective measurement – thus to supplement auditory impressions by testable physical properties – finally to replace auditory observation altogether. • This development has culminated in Laboratory Phonology.

4 From Sound to Sense • The origin of speech technology after World War

4 From Sound to Sense • The origin of speech technology after World War II had the communicative component incorporated – communications engineering, technological development to improve communiaction – Speech Communications Conference at MIT 1950 – Menzerath and Meyer-Eppler invited – >Institut für Phonetik u. Kommunikationsforschung – Research Laboratory of Electronics, Speech Communication Group, MIT

– Speech Communication Seminar, Stockholm 1974 – From Sound to Sense: 50+ years of

– Speech Communication Seminar, Stockholm 1974 – From Sound to Sense: 50+ years of discoveries in speech communication, MIT 2004 – invited paper by Sarah Hawkins: Puzzles and patterns in 50 years of research on speech perception

“… new theories will aim to include the following attributes. They should be biologically

“… new theories will aim to include the following attributes. They should be biologically plausible; include roles for attention, memory, and learning; focus on understanding meaning rather than identifying phonological form; allow for multiple potential ‘units of perception’, possibly with no obligatory units; and they should allow meaning and linguistic structure to be understood from incomplete information. …”

5 From Sense to Sound • But we also need to include the complement

5 From Sense to Sound • But we also need to include the complement – Jakobson, Fant, Halle, Preliminaries to speech analysis, 1952 “given the evident fact that we speak to be heard to be understood” – Speakers transmit meaning – by coding it in words and syntactic structures with fine phonetic detail of segments and prosodies – generating acoustic signals for listeners to decode

 • There are two questions: – How is the phonetic form of words

• There are two questions: – How is the phonetic form of words represented mentally to trigger physiological and articulatory processes for acoustic sound production? – What are the rules for producing reduced or elaborated phonetic forms?

 • Answers to the first question must specify essential phonetic elements that define

• Answers to the first question must specify essential phonetic elements that define the whole formal set of a lexical item – this specification must include segmental units as well as articulatory prosodies – both are related to lexical, morphological and speech style categories – which allow for phonetic under-specification

 • The answer to the second question goes well beyond descriptive accounts of

• The answer to the second question goes well beyond descriptive accounts of large databases • it needs to include the coupling of reduction/ elaboration with lexical class, morphology, syntax and speaking style, closely linked to the answer of the first question

6 From Sense to Sound to Sense • Finally, we have to combine the

6 From Sense to Sound to Sense • Finally, we have to combine the Speaker’s Sense-to -Sound with the Listener’s Sound-to-Sense in dialogue interaction. • At this point, the Propositional, Expressive and Appeal functions of speech communication and their prosodic coding come to the fore. • And for this we need a new methodology of data acquisition that is adaptable to the specific research questions asked by speech scientists – going beyond isolated words and sentences.

 • If we take the steps I have outlined we will be progressively

• If we take the steps I have outlined we will be progressively providing answers to the central question of Phonetics How do humans communicate with speech in all types of speech interactions in the world’s languages ? • developing an integrated framework of Sounds and Prosodies in Communicative Phonetic Science