Fundamental frequency F 0 Pitch Hz Intensity Loudness

운율이란? • 이들 구성요소들은 음성학적으로 다음과 같이 불립니다 • Fundamental frequency (F 0, Pitch),

Technical details • Manipulation of 1. segmental durations, including phrase breaks 2. F 0

Technical details Moulines & Charpentier, 1990 [1] original waveform windowed waveform 1 2 3

Technical details 1 Segmental durations • Segment alignment & PSOLA processing of durations :

Technical details 1+2 Segmental durations + F 0 contour • PSOLA processing of F

Technical details 1+2+3 Segmental durations + F 0 contour + intensity contour • Mathematically

Technical details 1+3 Segmental durations + intensity contour • Segment alignment & PSOLA processing

Technical details 2+3 F 0 contour + intensity contour • “Reverse” segment alignment &

Technical details • Weakness 1. Voiceless segments can be made “voiced” in the windowing

Technical details Examples native utterance non-native utterance synthetic non-native (durations+F 0+intensity) synthetic non-native (durations+intensity)

Technical details Comparison before synthesis – duration, F 0 & intensity (blue & yellow)

Technical details Comparison after synthesis – duration, F 0 & intensity (blue & yellow)

Technical details Comparison after synthesis – duration & intensity (blue & yellow) native utterance

Technical details Comparison after synthesis –F 0 & intensity (blue & yellow) native utterance

활용분야 • The technique could be used (1) In second language education to facilitate/motivate

References [1] E. Moulines and F. Charpentier (1990) “Pitch synchronous waveform processing techniques for

Slides: 30

Download presentation

운율이란? • 이들 구성요소들은 음성학적으로 다음과 같이 불립니다 • Fundamental frequency (F 0, Pitch), Hz Intensity (Loudness), d. B Duration (Length), msec 6

Technical details • Manipulation of 1. segmental durations, including phrase breaks 2. F 0 contours 3. intensity contours • For 1 and 2 PSOLA (Pitch Synchronous Over. Lap and Add), developed by Moulines & Charpentier, 1990 [1] implemented in Praat [2] • For 3 Intensity swap in Praat 16

Technical details Moulines & Charpentier, 1990 [1] original waveform windowed waveform 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 shortened waveform 1 1 4 7 3 10 5 13 16 7 19 9 waveform with lower F 0 11 13 15 17 19 17

Technical details 1 Segmental durations • Segment alignment & PSOLA processing of durations : Alignment can be manual or automatic (with the help of speech recognition) k “…came in…” shrin h st re t c non-native n k k e. I m i native e. I m i n 18

Technical details 1+2 Segmental durations + F 0 contour • PSOLA processing of F 0 on duration-treated utterance native F 0 native k e. I m i n non-native F 0 19

Technical details 1+2+3 Segmental durations + F 0 contour + intensity contour • Mathematically “neutralize” non-native speaker’s intensity contour and transfer native speaker’s intensity contour in Praat – Holger Miterer (personal communication) native intensity native k e. I m i n non-native intensity 20

Technical details 1+3 Segmental durations + intensity contour • Segment alignment & PSOLA processing of duations followed by intensity contour transfer native intensity shrin h st re t c non-native n k k e. I m i native k e. I m i n non-native intensity 21

Technical details 2+3 F 0 contour + intensity contour • “Reverse” segment alignment & PSOLA processing of F 0 followed by intensity contour transfer native F 0 native intensity k st re t c k shri n non-native n h k e. I m i native e. I m i n non-native F 0 non-native intensity 22

Technical details • Weakness 1. Voiceless segments can be made “voiced” in the windowing process (pitch-synchronous technique) 2. Excessive handling results in unnatural synthesis (One solution; pitch rescaling [3]) • Segment alignment should be fine-tuned according to the voiced/voicless status of the (sub-)segments for better results 23

Technical details Examples native utterance non-native utterance synthetic non-native (durations+F 0+intensity) synthetic non-native (durations+intensity) synthetic non-native (F 0+intensity) 24

Technical details Comparison before synthesis – duration, F 0 & intensity (blue & yellow) native utterance non-native utterance 25

Technical details Comparison after synthesis – duration, F 0 & intensity (blue & yellow) native utterance synthetic non-native 26

Technical details Comparison after synthesis – duration & intensity (blue & yellow) native utterance synthetic non-native 27

Technical details Comparison after synthesis –F 0 & intensity (blue & yellow) native utterance synthetic non-native 28

활용분야 • The technique could be used (1) In second language education to facilitate/motivate acquisition of the target language prosody to emphasize the importance of prosody in achieving native speaker fluency (2) For patients with vocal disorders to help achieve the prosody of a normal voice • Auto-segmentation via ASR (Automatic Speech Recognition) or DTW (Dynamic Time Warping) [3] can be employed to automate the segment alignment. 29

References [1] E. Moulines and F. Charpentier (1990) “Pitch synchronous waveform processing techniques for text-to-speech synthesis using diphones” Speech Communication 9, 453 -467. [2] P. Boersma (2005) “Praat, a system for doing phonetics by computer”, Glot International, Vol. 5(9/10), pp. 341 -345. [3] S. Yi (2007) “Perception of English prosody by Americans and Koreans and its pedagogical implications”, Ph. D. Dissertation, Busan: Pusan National University. [4] K. Yoon (2006) “Imposing native speakers’ prosody on non-native speakers’ utterances”, Proceedings of the 9 th Western Pacific Acoustics Conference (WESPAC 9), Seoul, South Korea. 30