Moving forward with multilingual transcription Naomi Nagy University

  • Slides: 44
Download presentation
Moving forward with multilingual transcription Naomi Nagy University of Toronto Paulina Łyskawa University of

Moving forward with multilingual transcription Naomi Nagy University of Toronto Paulina Łyskawa University of Maryland

Goals 2 ¡training forum in which to develop protocols for sharable data that conform

Goals 2 ¡training forum in which to develop protocols for sharable data that conform to the spirit of NSF policy (for sharable archived data) ¡describe and problematize how we indicate use of multiple languages within one conversation and efforts to maintain consistency across protocols from different languages/communities, commenting on efforts to make these transcripts useful for inquiries developed subsequent to transcription ¡specific coding conventions for language choice ¡appropriate metadata for language choice Nagy & Łyskawa / LSA 2016

30 Dec. 2007 Language quitl Italian Chinese Cantonese Punjabi Portuguese Spanish Tagalog Urdu Tamil

30 Dec. 2007 Language quitl Italian Chinese Cantonese Punjabi Portuguese Spanish Tagalog Urdu Tamil Polish 3 Nagy & Łyskawa / LSA 2016 3

What is the HLVC Project? ¡ Large-scale project investigating Variation and Change in Toronto’s

What is the HLVC Project? ¡ Large-scale project investigating Variation and Change in Toronto’s Heritage Languages. ¡ Project’s goals (Nagy 2011) • To document and describe heritage languages spoken by immigrants and 2 generations of their descendants • To create a corpus available for research on a variety of topics • To push variationist research beyond its monolingually-oriented core (and its majority language focus) (cf. Nagy & Meyerhoff 2008) • Descriptive and theoretical goals: Ø develop generalizations about the types of variable features, structures or rules that are borrowed earlier and more often Ø Use consistent methods across languages and variables Nagy & Łyskawa / LSA 2016 4

 b Lviv, Ukraine 1913 Western Poland, 1911 Faeto, Italy 1950 Nagy & Łyskawa

b Lviv, Ukraine 1913 Western Poland, 1911 Faeto, Italy 1950 Nagy & Łyskawa / LSA 2016 Budapest, Hungary, 1885 5

Contrasting demographics Toronto, 2011 Census Language MT speakers Ethnic Origin Cantonese 170, 000+ 594,

Contrasting demographics Toronto, 2011 Census Language MT speakers Ethnic Origin Cantonese 170, 000+ 594, 735 Italian 166, 000 475, 090 Russian 78, 000 118, 090 Ukrainian 26, 000 130, 355 Polish 75, 275 214, 460 Korean 51, 000 64, 755 Faetar <300? 800 Est. in TO Speakers from 1951 Hong Kong 1908 Calabria 1916 St. Petersburg, Moscow 1913 Lviv Eastern Poland 1967 Seoul 1950 Faeto & Celle (Apulia) www 40. statcan. ca/l 01/cst 01/demo 12 c-eng. htm; www 12. statcan. gc. ca/ Nagy & Łyskawa / LSA 2016 6

Long-range questions 7 LINGUISTIC: ¡Are cross-linguistic generalizations possible about the types of features, structures,

Long-range questions 7 LINGUISTIC: ¡Are cross-linguistic generalizations possible about the types of features, structures, rules or constraints that are borrowed earlier and more often ? ¡If so, what ? Nagy 2009, 2011 Nagy & Łyskawa / LSA 2016

Long-range questions 8 SOCIOLINGUISTIC: ¡How are social factors relevant ? ¡Do the same (types

Long-range questions 8 SOCIOLINGUISTIC: ¡How are social factors relevant ? ¡Do the same (types of) speakers lead changes in both/all their languages ? ¡Or do speakers choose to use one language or the other for this social “work” ? Nagy 2009, 2011 Nagy & Łyskawa / LSA 2016

Data collection methods for naturalistic speech 1. Sociolinguistic interview 2. Ethnic Orientation Questionnaire 3.

Data collection methods for naturalistic speech 1. Sociolinguistic interview 2. Ethnic Orientation Questionnaire 3. Picture Description Task All conducted and recorded by native speakers in the heritage language Nagy & Łyskawa / LSA 2016 9

Different languages; different protocols ¡representation and annotation of English in transcripts of conversations in

Different languages; different protocols ¡representation and annotation of English in transcripts of conversations in Heritage Languages (Cantonese, Faetar, Italian/Calabrese, Korean, Polish, Russian, and Ukrainian) ¡ Review methods that differ by language team ¡ Discuss best option(s) for standardizing Nagy & Łyskawa / LSA 2016 10

Ukrainian- the most straightforward 11 ¡transcribe English words with capital letters ¡“If a word

Ukrainian- the most straightforward 11 ¡transcribe English words with capital letters ¡“If a word exists in both language, then I will listen closely to the phonology and transcribe it accordingly. ¡If they pronounce an English word with a Ukrainian accent then I will transcribe it in Ukrainian, but I will make a note in the notes tier. ” [MH] Nagy & Łyskawa / LSA 2016

UKR example in ELAN (translation added) Nagy & Łyskawa / LSA 2016 12

UKR example in ELAN (translation added) Nagy & Łyskawa / LSA 2016 12

UKR examples 13 § UH YEAH til'ky tak SUBCONSCIOUSLY vin je duzhe spravedlyvyj UM

UKR examples 13 § UH YEAH til'ky tak SUBCONSCIOUSLY vin je duzhe spravedlyvyj UM v kozhnomu sensi teper Uh yeah just subconsciously he he is very fair um in every sense now. [U 3 M 41 A_IV. eaf, 31: 30] § vin nazyvajet'sja ATTILIO ja joho nazyvaju ARISTOTLE He is called Attilio, I call him Aristotle. [U 2 F 60 A_IV. eaf, 48: 38] § chasamy my jidemo do Fljorydy na MARCH BREAK i todi my USUALLY idemo do des' na lito SO Sometimes we go to Florida for March Break and then we usually go to somewhere for the summer so [U 3 F 13 A_IV. eaf, 9: 37] § Regular expression searchable: [ABC] or Notes tier Nagy & Łyskawa / LSA 2016

14 RUS examples ¡Aga, ja prepodavala francuzskij v [ENG: Uof. T], jeto bylo vsjo

14 RUS examples ¡Aga, ja prepodavala francuzskij v [ENG: Uof. T], jeto bylo vsjo [ENG: part-time]. Yes, I taught French at Uof. T, it was all part-time. [R 1 F 55 B_IV_PR. eaf, 2: 38] ¡Tam oni ochen' mnogo tam [ENG: fundraising] i tam raznyx vesjolyx veshhej. There they do a lot of fundraising and various fun things. [R 2 F 12 A_IV_PR. eaf, 0: 24] § Regular expression searchable: “[ENG: ” Nagy & Łyskawa / LSA 2016

RUS examples 15 ¡ 3 -letter language tag “ENG” (or another language) introduces any

RUS examples 15 ¡ 3 -letter language tag “ENG” (or another language) introduces any non-Russian word/phrase, which is bracketed ¡English words with Russian morphemes are transcribed as Russian ¡“Whether we use English spelling or transliterate the utterance depends largely on how the speaker says it, whether they use English-like or Russian-like pronunciation. ¡Context - are they deliberately using an English word? struggling to find a Russian equivalent? ¡Abbreviations like “Uof. T” will be written in English because they are abbreviations of English words. It's possible to come up with abbreviations of Russian words for the same concepts. ” [NL] ¡NB: transcription is transliteratable with Comrie & Corbett’s (2002) system, at http: //www. translit. ru/ Nagy & Łyskawa / LSA 2016

16 English words with Russian morphemes ¡da, vam poslajsat' kolbasku ili pisikom, da. Yes,

16 English words with Russian morphemes ¡da, vam poslajsat' kolbasku ili pisikom, da. Yes, would you like your kielbasa sliced or in one piece, yes. [R 1 M 56 A_IV_PR. eaf, 0: 29: 23] poslajsat po+slice+at’ "to slice” pisikom piece+ik+o “in one piece” ¡Not regular expression searchable Nagy & Łyskawa / LSA 2016

Cantonese 17 ¡Current transcription system: use Jyutping (jyut 6 ping 3) romanization ¡every Cantonese

Cantonese 17 ¡Current transcription system: use Jyutping (jyut 6 ping 3) romanization ¡every Cantonese word has a number indicating tone as the final character ¡there are tone markings on some English words ¡ Mandarin borrowings aren’t distinguished ¡Now adding: transcribing characters ¡Cantonese and English will be more distinct ¡Mandarin borrowings will be less obvious [SL] Nagy & Łyskawa / LSA 2016

Examples from Cantonese 18 § seng 4 jat 6, ngo 5 dei 6 seng

Examples from Cantonese 18 § seng 4 jat 6, ngo 5 dei 6 seng 4 jat 6 heoi 3 pet store go 2 zan 4 si 4 le 3 keoi 5 hai 6 When we go to the pet store she always goes -- [C 2 F 16 A_IV. eaf 29: 29] § "Usually" like mou 2 gam 3 je 6 la 1 usually, like, not very late [C 2 F 16 A_IV. eaf 6: 55] § zing 3 fu 5 le 1 zau 6 jau 5 jat 1 di 1 giu 3 zou 6 housing scheme bei 5 ni 1 di 1 The gov’t has something known as housing scheme to provide-- [C 1 M 61 A_IV. eaf, 4: 20] § English words without tone are regular expression searchable: [az. A-Z]s Nagy & Łyskawa / LSA 2016

KOR examples 19 § 그래서 꼭 기회가 되면 , 한국에 나의 누이가 사니까 my

KOR examples 19 § 그래서 꼭 기회가 되면 , 한국에 나의 누이가 사니까 my sister lives still 그 러니까 애들을 , 좀 여름 방학 때는 좀 한달이라도 좀 보내고 좀 크면 […]ꊀ ¡So if there was an opportunity, my sister lives in Korea—my sister lives [there] still so I would like to send my kids there for a month during the summer when they get older… [K 1 M 45 A_IV. eaf 16: 16] ¡그레도 저도 이제 you know five-six, five years 결혼 했으니깐 재 맛도 조 금 틀여졌어요 my wife 으로 Because she does most of the cooking. ¡Even now, I’ve been married for you know five-six, five years, my taste has changed a bit—my wife—Because she does most of the cooking [K 2 M 34 A_IV. eaf 3: 05] § Regular expression searchable: [a-z. A-Z] ¡code-switching (a-theoretical cover term for all kinds of lexical mixing) is marked in transcriptions by use of Roman rather than Hangul characters Nagy & Łyskawa / LSA 2016

Chung, S. 2010. Code-switching as a means of cultural identity among Koreans in Toronto.

Chung, S. 2010. Code-switching as a means of cultural identity among Koreans in Toronto. TULCON '10 conference, U of Toronto. Nagy & Łyskawa / LSA 2016 20 10/26/15

Code-switching vs. Borrowing 21 Type of integration into source language (Poplack 1980: 584) Type

Code-switching vs. Borrowing 21 Type of integration into source language (Poplack 1980: 584) Type Phonological Morphological Syntactic Code. Switching? 1 + + + No; Borrowing Type 1) 2 Yes + 저는 3 북 [buk] 들 많이읽거요 Yes + I-TOPIC book-PLUR a lot read-POL 4 Yes “I read a lot of books. ” [K 2 F 22 A] book has Korean, not English phonology [bʊk]. Korean plural morpheme “들” is incorporated with book. Korean syntax (SOV) is used. book is a borrowing and not code-switching. Sheila Chung 2010 (LIN 497 paper) Nagy & Łyskawa / LSA 2016

Code-switching vs. Borrowing 22 Type of integration into source language (Poplack 1980: 584) Type

Code-switching vs. Borrowing 22 Type of integration into source language (Poplack 1980: 584) Type Phonological Morphological Syntactic Code. Switching? 1 + + + No; Borrowing Yes + 3 Yes Type 2) 아빠는 movies 좋아헤요 4 Yes Dad-TOP movies like-POL “[My] dad likes movies. ” [K 2 M 25 A] Movies has English phonology [muviz]. The plural ‘s’ is English morphology. Korean syntax: SOV Thus, phonology and morphology are not integrated into Korean CS 2 + Nagy & Łyskawa / LSA 2016 Sheila Chung 2010 (LIN 497 paper)

Code-switching vs. Borrowing 23 Type of integration into source language (Poplack 1980: 584) Type

Code-switching vs. Borrowing 23 Type of integration into source language (Poplack 1980: 584) Type Phonological Morphological Syntactic Code. Switching? 1 + + + No; Borrowing 2 + - Yes 3 4 Type 3) Seventy-three 니까 thirty-six years Seventy-three so thirty-six years “[The year] ’ 73, so 36 years” [K 1 M 70 A] “thirty-six years” has Korean phonology”: [tɜrti]. Yes Sheila Chung 2010 (LIN 497 paper) Nagy & Łyskawa / LSA 2016

Code-switching vs. Borrowing 24 Type of integration into source language (Poplack 1980: 584) Type

Code-switching vs. Borrowing 24 Type of integration into source language (Poplack 1980: 584) Type Phonological Morphological Syntactic Code. Switching? 1 + + + No; Borrowing 2 - + - Yes 3 + 4 - - - Yes Type 4) 저한테는 I’m hoping they’ll learn it Me for-TOPIC I’m hoping they’ll learn it “For me, I’m hoping they’ll learn it” [K 2 M 24 A] No integration into Korean CS Nagy & Łyskawa / LSA 2016 Sheila Chung 2010 (LIN 497 paper)

25 Where in the typology do we mark speech as “English”? Type of integration

25 Where in the typology do we mark speech as “English”? Type of integration into source language (Poplack 1980: 584) Type Phonological Morphological Syntactic Code. Switching? 1 + + + No; Borrowing 2 - + - Yes 3 + 4 - - - Yes ¡Decisions about where to place threshold determine how much/where a speaker uses each language. Nagy & Łyskawa / LSA 2016

Threshold (of integration) position determines how much a speaker uses each language Fully (Type

Threshold (of integration) position determines how much a speaker uses each language Fully (Type 1) 120 100 80 Fully + Partly (Types 1 -3) Fully + Partly + Non (Types 1 -4) 60 40 20 K 2 F 2 K 1 1 B F 4 K 2 2 A F 4 K 1 2 A M 4 K 2 5 A M 3 K 1 4 A F 3 K 2 9 A M 2 K 2 2 A M 2 K 1 4 A M 63 K 2 A F 4 K 1 0 A M 70 K 1 A F 4 8 A 0 Nagy & Łyskawa / LSA 2016 26

Mean % of CS to English by Generation and Cultural Orientation 27 Cultural orientation

Mean % of CS to English by Generation and Cultural Orientation 27 Cultural orientation Generation Sheila Chung 2010 (LIN 497 paper) LIN 251 / Week 7 / Nagy 10/26/15

Faetar §Probably not regular expression searchable: non-IPA-font ? §“English” is noted in separate tier

Faetar §Probably not regular expression searchable: non-IPA-font ? §“English” is noted in separate tier or “(ITALIAN)” follows transcription Nagy & Łyskawa / LSA 2016 28

29 FAE examples ¡UNMARKED BORROWING: andʌj vənʌntə frut: ɛ vɛɡɡɛtabl where they sell fruits

29 FAE examples ¡UNMARKED BORROWING: andʌj vənʌntə frut: ɛ vɛɡɡɛtabl where they sell fruits and veɡetables [F 1 F 70 A_IV. eaf, 2: 01. 925] ¡MARKED BORROWING: ¡ɪn toskan i kiamuntə lə i lamponi ¡in Tuscan they call the "the raspberries" (ITALIAN) [F 1 M 75 A&family_IV_part 1. eaf, 29: 26] Nagy & Łyskawa / LSA 2016

30 Italian ¡mio padre mi aveva comprato, come si dice, una "top" you know,

30 Italian ¡mio padre mi aveva comprato, come si dice, una "top" you know, uno di quelli tops che si girano My father bought me, how do you say, a “top, ” you know, one of those tops that spins [I 2 F 53 A_IV. eaf, 02: 34] ¡di solito, I ignore them (laughs). Usually, I ignore them. [I 2 F 53 A_IV. eaf, 2: 59] ¡well George and I abitamo, a abitamo proprio a Avenue Rd e Eglinton Well, George and I live, live right at Avenue Road and Eglinton. [I 2 F 32 A_IV_PT 2. eaf, 2: 20 -2: 25 ] Not usually marked, not searchable Nagy & Łyskawa / LSA 2016

Recommendations (for the HLVC proofreading phase) ¡Hyphenate proper-nouns reliably. ¡Anything English-y could be on

Recommendations (for the HLVC proofreading phase) ¡Hyphenate proper-nouns reliably. ¡Anything English-y could be on a separate tier rather than bracketed&flagged? Which is better? ¡ If on separate tier, then time-aligned (slower to produce; faster to analyze). ¡For KOR (and sometimes CAN) it’s a different orthography. ¡Establish rules for English borrowings in CAN, in particular Nagy & Łyskawa / LSA 2016 31

“Quick” measures of “proficiency” 32 ¡Speech rate-exclude English switches? (Brook & Nagy excluded them)

“Quick” measures of “proficiency” 32 ¡Speech rate-exclude English switches? (Brook & Nagy excluded them) ¡Vocab size – how many words are English? ¡C-S rate – Does more C-S mean better command of grammar of both lgs? or does it mean gaps in HL vocab that need to be filled by English? [Does this have inverse correlation w/ speech rate? ] ¡Note: We don’t necessarily want these measures to “work, ” i. e. to correlate to sociolx variation or to EOQ, but we are caught between var. sociolx. (of monolg. fluent spkrs) & SLA methods ¡Anything automated, e. , g. , forced alignment (add dictionary of English to the HL dictionary before PLab would work) or speech rate calculations, these need to be separately treated or excluded. Nagy & Łyskawa / LSA 2016

Metadata 1 33 ¡HLVC Interview catalog (sometimes) contains notes regarding switches to English, rough

Metadata 1 33 ¡HLVC Interview catalog (sometimes) contains notes regarding switches to English, rough frequency of code-switching ¡Examples: ¡ RUS & UKR: nothing noted in catalog (but easy to count in interview. eafs) ¡ KOR, as the result of a year-long study (Chung 2010), has a “code-switch” column: yes/some/no ¡ POL, as the result of 2 year-long studies (Łyskawa 2015, Łyskawa et al. fc. ), has notes on code-switching: lots/Ø ¡ ITA has a few notes: For over 40 speakers, we see these 2: ¡ “Very chatty, lots of code switching!” (I 2 F 53 A) ¡ “Does not speak much Italian at all, words are mostly cued in by interviewer, partial transcription as a consequence” (I 3 M 15 A) Nagy & Łyskawa / LSA 2016

Metadata 2 ¡CAN: ¡ “Clear, Lots of English Phrases” (C 2 M 21 B)

Metadata 2 ¡CAN: ¡ “Clear, Lots of English Phrases” (C 2 M 21 B) ¡ good sound, lots of English…One-word answers” (C 3 F 12 A) ¡ “Speaks lots of English” (C 3 F 18 A and C 2 M 14 A Nagy & Łyskawa / LSA 2016 34

CAN from Sam 1 ¡C 2 F 22 A with 3 English transcriptions marked.

CAN from Sam 1 ¡C 2 F 22 A with 3 English transcriptions marked. Time stamps are: -00: 57(English words with tones) -00: 02: 30(English words transcribed in jyutping & tones(numbaa 15)) -00: 02: 10(English words with NO tones) There seem to be a mix of all 3 types of transcriptions in all files. Sam recommends marking tone in English words, but that would probably be subjective. ¡One step to improve ENG transcriptions with HL transcriptions would probably be to actually set up guidelines to transcribe ENG words. Nagy & Łyskawa / LSA 2016 35

36 Nagy & Łyskawa / LSA 2016

36 Nagy & Łyskawa / LSA 2016

CAN 2 Nagy & Łyskawa / LSA 2016 37

CAN 2 Nagy & Łyskawa / LSA 2016 37

CAN 3 Nagy & Łyskawa / LSA 2016 38

CAN 3 Nagy & Łyskawa / LSA 2016 38

FAE from Michael (re-do if using) Nagy & Łyskawa / LSA 2016 39

FAE from Michael (re-do if using) Nagy & Łyskawa / LSA 2016 39

KOR from Deepam 1 Nagy & Łyskawa / LSA 2016 40

KOR from Deepam 1 Nagy & Łyskawa / LSA 2016 40

KOR from Deepam 2 Nagy & Łyskawa / LSA 2016 41

KOR from Deepam 2 Nagy & Łyskawa / LSA 2016 41

감사합니다 дякую Grazie molto Спасибо 多謝 gratsiə namuor: ə HLVC RAs: Cameron Abma Vanessa

감사합니다 дякую Grazie molto Спасибо 多謝 gratsiə namuor: ə HLVC RAs: Cameron Abma Vanessa Bertone Ulyana Bila Rosanna Calla Minji Cha Abigail Chan Ariel Chan Karen Chan Joanna Chociej Vivien Chow Sheila Chung Tiffany Chung Courtney Clinton Radu Craioveanu Marco Covi Naomi Cui Zahid Daujee Derek Denis Nagy & Łyskawa / LSA 2016 Tonia Djogovic Joyce Fok Paolo Frascà Matt Gardner Julia Grasso Rick Grimm Dongkeun Han Natalia Harhaj Taisa Hewka Melania Hrycyna Michael Iannozzi Diana Kim Janyce Kim Iryna Kulyk Mariana Kuzela Ann Kwon Alex La Gamba Carmela La Rosa Natalia Lapinskaya Vina Law Kris Lee Nikki Lee Olga Levitski Samuel Lo Arash Lotfi Paulina Lyskawa Rosa Mastri Timea Molnár Valeriya Mordvinova Francesco Muoio Jamie Oh Maria Parascandolo Deepam Patel Rita Pang Andrew Peters Alessia Plastina Tiina Rebane Hoyeon Rim 42 Will Sawkiw Maksym Shkvorets Vera Richetti Smith Anna Shalaginova Konstantin Shapoval Yi Qing Sim Mario So Gao Vlodymyr Sukhodolskiy Awet Tekeste Letizia Tesi Josephine Tong Sarah Truong Dylan Uscher Qian Ling Wang Ka-man Wong Junrui Wu Olivia Yu Minyi Zhu http: //projects. chass. utoronto. ca/ng

References 43 Chung, Sheila. 2010. Code-switching as a means of cultural identity among Koreans

References 43 Chung, Sheila. 2010. Code-switching as a means of cultural identity among Koreans in Toronto. TULCON '10 & Cornell Undergraduate Linguistics Colloquium 2010. Comrie, Bernard & Greville Corbett. 2002. The Slavonic Languages. London & New York: Routledge. 827, 832 -833. Farley, C. & D. Lister. 2007. Greater Toronto’s language quilt. Toronto Star. Dec. 30, 2007. Keefe, S. & A. Padilla. 1987. Chicano Ethnicity. Albuquerque: UNM Press. Lyskawa, P. 2015. Variation in case marking in Heritage Polish. MA Thesis, Linguistics Department, University of Toronto. Lyskawa, P. , R. Maddeaux, E. Melara & N. Nagy. submitted. Heritage speakers follow all the rules: Language contact and convergence in Polish devoicing. Heritage Language Journal. Nagy, N. 2009. Heritage Language Variation and Change. http: //individual. utoronto. ca/ngn/research/heritage_lgs. htm. Nagy, N. 2011. A multilingual corpus to explore geographic variation. Rassegna Italiana di Linguistica Applicata 43. 1 -2: 65 -84. Nagy & Łyskawa / LSA 2016

References 44 Nagy, N. & M. Meyerhoff 2008. The social life of sociolinguistics. In

References 44 Nagy, N. & M. Meyerhoff 2008. The social life of sociolinguistics. In Social Lives in Language: Sociolinguistics and Multilingual Speech Communities, M. Meyerhoff & N. Nagy (eds), Amsterdam: John Benjamins. Poplack, S. 1980. Sometimes I’ll start a sentence in Spanish Y TERMINO EN ESPAÑOL: toward a typology of code-switching. Linguistics 8: 581 -618. Statistics Canada. http: //www 12. statcan. gc. ca. Nagy & Łyskawa / LSA 2016