Mapping endangered records of endangered cultures or We

  • Slides: 53
Download presentation
Mapping endangered records of endangered cultures or We have harvesters but not enough fruit

Mapping endangered records of endangered cultures or We have harvesters but not enough fruit Nick Thieberger School of Languages and Linguistics University of Melbourne Charting Vanishing Voices: A Collaborative Workshop to Map Endangered Oral Cultures: WOLP 2012 Workshop

Pacific and Regional Archive for Digital Sources in Endangered Cultures (PARADISEC) Metrics (June 2012)

Pacific and Regional Archive for Digital Sources in Endangered Cultures (PARADISEC) Metrics (June 2012) 274 collections of which 181 are publicly available 8, 268 items of which 7, 637 are publicly available 59, 987 files Size : 6. 04 TB Time : 3, 390 hours 716 languages represented in the collection, from 65 countries Type Files Size . cha. dv. eaf. jpg. lbl. mov. mp 3. mp 4. mpg 39 173. 39 KB 32 145. 07 GB 125 7. 95 MB 21, 956 39. 88 GB 30 734. 92 KB 66 493. 62 GB 9, 490 181. 59 GB 81 19. 13 GB 106 34. 42 GB . mxf. pdf. rtf. tab. tif. trs. txt. wav. xml 42 5, 035 4, 681 40 1, 626 189 363 9, 492 6, 546 356. 05 GB 3. 10 GB 87. 30 MB 819. 65 KB 52. 46 GB 1. 05 MB 23. 70 MB 4. 73 TB 142. 92 MB

Pacific and Regional Archive for Digital Sources in Endangered Cultures (PARADISEC) Collaborative archiving project

Pacific and Regional Archive for Digital Sources in Endangered Cultures (PARADISEC) Collaborative archiving project begun in 2002 Team made up of linguists and musicologists Thee universities in a consortium (Sydney, Melbourne, ANU)

Endangered records Too little is recorded in most of the world’s languages Much of

Endangered records Too little is recorded in most of the world’s languages Much of what is recorded is not being looked after properly We can’t even find what has been recorded How can we change that?

Too little is recorded in most of the world’s languages How much fieldwork is

Too little is recorded in most of the world’s languages How much fieldwork is going on? • Newman (1992 and 2004) reports 34 US departments running fieldmethods courses • LLL conference 2009 – 180 abstracts • 2 nd International Conference Language Documentation and Conservation 2011 – 230 abstracts -

Too little is recorded in most of the world’s languages How much fieldwork is

Too little is recorded in most of the world’s languages How much fieldwork is going on? • Assume at least 100 current fieldwork-based linguistic projects • Since 1960, assuming 50 per year there should be reasonable records of 2500 languages • Recordings, texts, dictionaries – paper and digital (from the late 1980 s onwards)

Too little is recorded in most of the world’s languages • Not even all

Too little is recorded in most of the world’s languages • Not even all funded projects are producing well-formed records – Well formed means described, archived and accessible, e. g. , ELDP – funded 2641 projects but ELAR has somewhere around 1102 deposits 1 http: //www. hrelp. org/grants/projects/index. php? year=all 2 http: //www. paradisec. org. au/blog/2012/04/elar-update

Too little is recorded in most of the world’s languages • More recording by

Too little is recorded in most of the world’s languages • More recording by non-linguists is necessary

Too little is recorded in most of the world’s languages • More recording by

Too little is recorded in most of the world’s languages • More recording by non-linguists is necessary • New methods (e. g. , Basic Oral Language Documentation - BOLD) that could include more recording by speakers

Too little is recorded in most of the world’s languages • More recording by

Too little is recorded in most of the world’s languages • More recording by non-linguists is necessary • New methods (e. g. , Basic Oral Language Documentation - BOLD) that could include more recording by speakers • Social media as a source of recordings/texts/etc

Too little is recorded in most of the world’s languages • More recording by

Too little is recorded in most of the world’s languages • More recording by non-linguists is necessary • New methods (e. g. , Basic Oral Language Documentation - BOLD) that could include more recording by speakers • Social media as a source of recordings/texts/etc • How to ensure this kind of recording has longevity?

There should be reasonable records of 2500 languages • Where are they? • How

There should be reasonable records of 2500 languages • Where are they? • How do we find them?

What is recorded is not being looked after properly

What is recorded is not being looked after properly

What is recorded is not being looked after properly Digital recordings more fragile than

What is recorded is not being looked after properly Digital recordings more fragile than analog, but most are not being archived

We can’t even find what has been recorded Harvesting tools: World. Cat http: //www.

We can’t even find what has been recorded Harvesting tools: World. Cat http: //www. oclc. org/worldcat LLMap (Linguist List, USA) http: //www. llmap. org Multitree http: //multitree. org UNESCO Atlas http: //www. unesco. org/culture/languages-atlas ELCat / Endangered Language Catalog http: //www. endangeredlanguages. com

Aggregated information http: //oralliterature. org/database, since mid-2010

Aggregated information http: //oralliterature. org/database, since mid-2010

We can’t even find what has been recorded Language codes as a basis for

We can’t even find what has been recorded Language codes as a basis for searching - ISO-639 -3, three-letter codes Typically not used by most repositories (small regional libraries, State libraries, Film and Sound archives)

We can’t even find what has been recorded British Library

We can’t even find what has been recorded British Library

We can’t even find what has been recorded National Library of Australia

We can’t even find what has been recorded National Library of Australia

We can’t even find what has been recorded Vienna Phonogrammarchiv

We can’t even find what has been recorded Vienna Phonogrammarchiv

||Aikwe (Naro) Abron Abuluti Abzachisch (Adygeisch Dialekt) Acholi Adygeisch Dialekt (Adygeisch) Afrikaans Agau Aghul

||Aikwe (Naro) Abron Abuluti Abzachisch (Adygeisch Dialekt) Acholi Adygeisch Dialekt (Adygeisch) Afrikaans Agau Aghul Dialekt (Aghul) Darra-i Nur (Pashai) Pashtu Dialekt (Pashtu) Pelende Permjakisch Persisch Standardsprache (Persisch) Phakey Pidgin- und Kreolsprachen, englisch-basiert Pokomo Polnisch Standardsprache (Polnisch) Polynesisch Pomo Pondo (Pana) Portugiesisch Pulaar Fulfulde Punjabi Rajasthani Raji Rakhshani (Baluči Dialekt) Rathwi-Bhilali (Bhilali) Rätoromanisch Dialekt (Rätoromanisch) Raute Rendille Romagnolisch (Italienisch Dialekt) Romanes nonvlax Balkan (Romanes) Romanes nonvlax Gopti (Romanes) Romanes nonvlax Nord Ost (Romanes) Romanes nonvlax Nord West (Romanes) Romanes nonvlax Zentral Nord (Romanes) Romanes nonvlax Zentral Süd (Romanes) Romanes vlax (Romanes) Romanisch Dialekt aus Italien (Romanisch) Roncalés (Baskisch Dialekt) Ronga Rugciriku Rumänisch Dialekt (Rumänisch) Rumänisch Standardsprache (Rumänisch) Russisch Standardsprache (Russisch) Ruthenisch Rutulisch Sadani Safen Saho Šahrī Sala Samaritanisch Samba Daka Samburu Sambyu (Kwangari) Sami Samo Sanaga Sango Sanskrit Sanye Sara Sardisch Dialekt (Sardisch) Scherpa Schopski (Bulgarisch Dialekt) Schottisch-Gälisch Dialekt (Schottisch -Gälisch) Schottisch-Gälisch Standardsprache (Schottisch. Gälisch) Schottisches Englisch (Englisch) Schottisches Englisch Standardsprache (Schottisches Englisch) Schottisches)

Online searching for language material e. g. , ‘Lewo’ as a language name? Google

Online searching for language material e. g. , ‘Lewo’ as a language name? Google – ‘Lewo’ – 3, 080, 000 hits Google – ‘Lewo grammar’ – 2, 200 hits Open Language Archives Community (OLAC) – ‘Lewo’ 13 hits

OLAC search result

OLAC search result

What else is out there? • Items held in personal collections can’t be located

What else is out there? • Items held in personal collections can’t be located • speakers who recorded their families • missionaries • patrol officers • These could be listed in catalogs, even if online access is restricted

Existing resources = low-hanging fruit e. g. , http: //anglicanhistory. org/oceania/

Existing resources = low-hanging fruit e. g. , http: //anglicanhistory. org/oceania/

Existing resources = low-hanging fruit Problems of longevity of websitebased data sources

Existing resources = low-hanging fruit Problems of longevity of websitebased data sources

Existing resources = low-hanging fruit Problems of longevity of websitebased data sources Use the

Existing resources = low-hanging fruit Problems of longevity of websitebased data sources Use the Internet Archive for a persistent identifier

06/19/12

06/19/12

Endangered recordings • Linguists need a shared infrastructure in which to locate their recordings

Endangered recordings • Linguists need a shared infrastructure in which to locate their recordings – to make them discoverable – to provide standard descriptions which can be located by standard search mechanisms – to enter metadata before it is forgotten

From the laptop to the archive Ex. Site 9 Metadata creation without (too many)

From the laptop to the archive Ex. Site 9 Metadata creation without (too many) tears File browser – assigning attributes to files created in fieldwork Application writes an XML file capturing relationships expressed by ‘drag and drop’ in the browser XML file submitted to an archive’s catalog

From the laptop to the archive Ex. Site 9 06/19/12

From the laptop to the archive Ex. Site 9 06/19/12

From the laptop to the archive Ex. Site 9 06/19/12

From the laptop to the archive Ex. Site 9 06/19/12

06/19/12

06/19/12

06/19/12

06/19/12

Ex. Site 9 In development in mid-2012 Cross-platform tool Expected release later in 2012

Ex. Site 9 In development in mid-2012 Cross-platform tool Expected release later in 2012

EOPAS – Delivery of text and media Encourage deposit of text and media -

EOPAS – Delivery of text and media Encourage deposit of text and media - Provide presentation formats for recorded texts - Based on a linguist’s normal workflows Record > Transcribe (Elan) > Interlinearise (Toolbox) > XML output > EOPAS http: //linguistics. unimelb. edu. au/research/projects/eopas/

Playable media Metadata http: //www. eopas. org/transcripts/55

Playable media Metadata http: //www. eopas. org/transcripts/55

Selected text Keyword in Context / Concordance in all texts of that language http:

Selected text Keyword in Context / Concordance in all texts of that language http: //www. eopas. org/transcripts/55

Ability to turn off morphemic view http: //www. eopas. org/transcripts/55

Ability to turn off morphemic view http: //www. eopas. org/transcripts/55

Reference to morphemelevel http: //www. eopas. org/transcripts/55

Reference to morphemelevel http: //www. eopas. org/transcripts/55

Reference to timed chunk http: //www. eopas. org/transcripts/55

Reference to timed chunk http: //www. eopas. org/transcripts/55

Stories Recorded by researchers Strong source community interest in hearing recordings and reading texts

Stories Recorded by researchers Strong source community interest in hearing recordings and reading texts Stored in digital archives Digitised from analog sources

Central harvesting by language code (ISO-639 -3)

Central harvesting by language code (ISO-639 -3)

Stories in many of the world’s 7, 000 languages

Stories in many of the world’s 7, 000 languages

Harvesting tools need something to harvest! Persuade linguists to create research data properly and

Harvesting tools need something to harvest! Persuade linguists to create research data properly and to deposit their materials in archives - create incentives in academia to create collections Locate existing digital material and incorporate it into principled online catalogs Location of analog collections and their digitisation and incorporation into principled online catalogs Building example texts/media for as many languages as possible

http: /paradisec. org. au thien@unimelb. edu. au

http: /paradisec. org. au thien@unimelb. edu. au

http: //www. nflrc. hawaii. edu/ldc/

http: //www. nflrc. hawaii. edu/ldc/