The School of Best Practices in Digital Language
The School of Best Practices in Digital Language Documentation Helen Aristar-Dry Eastern Michigan University July 15 -18, 2004 E-MELD 2004 Linguistic Databases & Best Practice
What are Best Practices? Practices designed to insure that digital language resources : n endure through time. n can be reused by others, both now and in the future. n are as independent as possible of computer environments, scholarly communities, and domains of application. -Bird & Simons 2003 July 15 -18, 2004 E-MELD 2004 Linguistic Databases & Best Practice
Why Best Practices? The impending “Digital Dark Age” July 15 -18, 2004 E-MELD 2004 Linguistic Databases & Best Practice
An impending “Digital Dark Age” LSA 2004) n n July 15 -18, 2004 (fr. Gary Simons, Future historians may see our present age as another Dark Ages since so much information documenting our current civilization is recorded digitally and will have vanished. If linguists fail to act in time, our digital data records are in danger of dying out before the endangered languages we are seeking to document. E-MELD 2004 Linguistic Databases & Best Practice
A paradox of writing history LSA 2004) n n July 15 -18, 2004 (fr. Gary Simons, The more advanced the writing technology, the less durable the written product. From most durable to least durable: Ø Clay tablets and stone Ø Velum Ø Papyrus Ø Paper Ø Digital word processing E-MELD 2004 Linguistic Databases & Best Practice
Storage media are ephemeral (fr. Gary Simons, LSA 2004) n Life expectancy of digital storage media: Ø Ø Magnetic tape: 10 to 20 years CD-R (write once) § § Ø CD-RW (write many times) § July 15 -18, 2004 Manufacturers say: 100 to 200 years Independent lab says: 30 years Manufacturers say: 25 years E-MELD 2004 Linguistic Databases & Best Practice
Hardware devices are ephemeral LSA 2004) n Removable media on personal computers advance over 25 years: Ø Ø Ø Ø July 15 -18, 2004 (fr. Gary Simons, 8 -inch floppies 5. 25 -inch floppies 3. 5 -inch floppies Zip drives CD-Rs DVD-Rs Memory sticks? E-MELD 2004 Linguistic Databases & Best Practice
Software formats are ephemeral LSA 2004) n n n Software vendors change file formats and functionality with each version. When we use a proprietary single vendor format, we lose access to the data when the software is obsolete. For instance, Ø July 15 -18, 2004 (fr. Gary Simons, Microsoft Word files from the 1980 s cannot be read by current versions of Word E-MELD 2004 Linguistic Databases & Best Practice
What’s a linguist to do? LSA 2004) n (fr. Gary Simons, Do two things to ensure that digital data endure long into the future: 1. Put the materials into an enduring file format. 2. Deposit the materials with an archive that will make a practice of periodically migrating them to new storage media as needed. July 15 -18, 2004 E-MELD 2004 Linguistic Databases & Best Practice
Some Best Practices n n July 15 -18, 2004 Distinguish between Ø archival form: The form in which information is stored for access long into the future. Ø working form: The form in which information is stored as it is created and edited Ø presentation form: The form in which information is presented to the public. BP recommendations primarily concern archival form. E-MELD 2004 Linguistic Databases & Best Practice
An aside on today’s workshop Databases and Best Practices: An affecting dialogue with 2 Anonymous Participants July 15 -18, 2004 E-MELD 2004 Linguistic Databases & Best Practice
Planning E-MELD 2004: Databases and Best Practices GS: “Helen, databases are NOT best practice” HAD: (chastened): “Um, er, yes…but everybody uses them for lexical information” GS: “Harumph. Then they are a tools to manipulate a WORKING FORM. Best practice is to export the data regularly and often into an ARCHIVAL FORM. HAD: “Yes sir…sorry, sir…now may I have more porridge? ” July 15 -18, 2004 E-MELD 2004 Linguistic Databases & Best Practice
Some Best Practices (cont. ) n n July 15 -18, 2004 Archival formats should employ open file formats and open standards. Preference for open source software or software supported by multiple vendors Archive files should be uncompressed and unedited Character encoding should be Unicode Metadata in a standard format should be included. (Two formats for language info: OLAC and IMDI). E-MELD 2004 Linguistic Databases & Best Practice
Goal: School of Best Practices To encourage linguists to think of themselves as Ø creating archive-ready documentation Ø for the benefit of future generations n To facilitate this undertaking by n providing information and tools July 15 -18, 2004 E-MELD 2004 Linguistic Databases & Best Practice
Organization of the School n n n n July 15 -18, 2004 Entrance Hall: orientation Case Studies: navigation Exhibits: examples of BP Classroom: lessons & tutorials Reading Room: bibliography Work Room: online work Tool Room: links to tools Help (incl. Ask an Expert) E-MELD 2004 Linguistic Databases & Best Practice
E-MELD 2004 Working Groups: What we’re hoping for Updates, corrections, etc. But also (and most importantly) New ideas: what can we add? July 15 -18, 2004 E-MELD 2004 Linguistic Databases & Best Practice
E-MELD 2004 Working Groups: What we’re hoping for n n IGT and video: what should we try to “teach”? How? What tools could facilitate BP? Online facilities to be created for Workroom Ø Small tools (scripts) esp. for conversion – what projects have created these? Would like to collect & document them. Ø July 15 -18, 2004 E-MELD 2004 Linguistic Databases & Best Practice
New Ideas (cont. ) Ø Additional challenges / situations not addressed in School: § What types of conversion have you or your students needed to do (e. g. file conversion or character conversion)? § What do students and colleagues ask you for advice on? Is the topic treated in the School? July 15 -18, 2004 E-MELD 2004 Linguistic Databases & Best Practice
New Ideas (cont) Ø What other stories can we tell? § § Ø Additional language documentation: § § § July 15 -18, 2004 Documentations of conversion processes? Personal stories “Worst practices (and how I saw the light)” Would you be willing to add some of yours? Would your students be willing to? Advertising in fall for 2 nd E-MELD post-doc E-MELD 2004 Linguistic Databases & Best Practice
Administrative matters: n n n July 15 -18, 2004 Updated list of WG assignments: Naomi? Group 4 (audio, video, images) members? Interim reports (3 minutes each): Ø Beginning of session II (Groups 1, 2, 3) Ø Beginning of session III (Groups 4, 5, 6) E-MELD 2004 Linguistic Databases & Best Practice
Administrative matters: n Suggestions/corrections: Ø n Please list URL of relevant page Please help us add to the bibliography for the reading room Ø software links in the tool room Ø n July 15 -18, 2004 Give these lists to your WG liaison before you leave E-MELD 2004 Linguistic Databases & Best Practice
THANK YOU July 15 -18, 2004 E-MELD 2004 Linguistic Databases & Best Practice
- Slides: 22