Report from MPI Team Roman Skiba Peter Wittenburg
Report from MPI Team Roman Skiba Peter Wittenburg DOBES Workshop Frankfurt April 2003 1
Data types • Tapes • Audio, Video (DV-PAL, DV-NTSC, VHS, DAT, MD) • other material: 8 mm movies, reel to reel audio, slides, photos • DMFs (mpeg 1, mpeg 2, wav) • Metadata (IMDI-sessions, IMDI-corpusstructures) • Session media • mpeg 1, wav - for further processing • mpeg 2 – for archiving • Html – as a container for text pictures and photos (jpeg) • PDF – as a container for text pictures and photos (jpeg) • Info files (pdf, txt, html) • Annotations (EAF, shoebox) DOBES Workshop Frankfurt April 2003 2
Statistics Raw data: tapes, DMFs and other media. DOBES Workshop Frankfurt April 2003 3
Statistics II Corpus units: meta data, media files, annotations. DOBES Workshop Frankfurt April 2003 4
DOBES Workshop Frankfurt April 2003 5
Digitizing problems • Recording problems • due to non-continuous time code • due to long play mode • due to stills between moving pictures (!) • Communication problems • Maarten handles all comm with great care • Money problems (due to budget cuts we have to be more careful with expenses - less copying etc) DOBES Workshop Frankfurt April 2003 6
Audio/Video Archiving • many discussions with archivists in particular about audio (Austrian/German audio/phonogram archive, EMELD) • point at LREC meeting: MP 3 and ATRAC (Minidisc) are not ideal, but are acceptable for listening to and normal analysis of speech (discussed type of reduction and effects) • attitude now: • any MD/MP 3 file is reformatted to PCM in the archive • strong recommendation to researchers to use 16 bit linear PCM HF • get best quality you can - new devices such as DENON • what is slightly higher costs for equipment in relation to total budget • miniaturization can be a problem • DENON Recorder • 192 MB flash cards (or even more) • linear PCM 768 kbps stereo = 16 min / mono = 32 min • MP 3 (MPEG 2 layer 2) 64 kbps: factor 12 => mono ~ 6 h DOBES Workshop Frankfurt April 2003 7
Video Digitization in the Field • audio no problem • video digitization at MPI was and is a success story • but slow cycle time - therefore digitization in the field good old mail DV-Camera I-link tests with MPEG-Camera not ok DV-encoding 3. 4 MB/sec 1 h = 20 GB proprietary limited sw support DOBES Workshop Frankfurt April 2003 MPEG 2 copy (~6 Mbps) MPEG 1 copy (~1 Mbps) MPEG 4 copy (0. 5 - …) etc conversion Tsunami MPEG 1 -encoding 1. 5 Mbps 1 h = 1 GB to work with • MPEG 2 widely accepted archive standard, various frontend codecs • still compressed - new standard will come in future • need your tapes (copies) and the MD file to create MPEG 2 versions • use camera in continuous mode !!!! then batch segmentation 8 • adapted workflows necessary
Access to Archive short-term DOBES Workshop Frankfurt April 2003 9
Access to the Do. Be. S archive I Current state • Digital data transport via • Mail (DMF, session media) • FTP (all data) with password and User ID • Email (metadata, annotations, infos) • IMDI Browser (metadata, infos) DOBES Workshop Frankfurt April 2003 10
Access to the Do. Be. S archive II Testing new ways • Digital data transport via • IMDI Browser (all integrated data types) password and User ID • HTML corpus (all data types) password and User ID • Remote access DOBES Workshop Frankfurt April 2003 11
Access to the Do. Be. S archive III Future scenario • Short term solution • To open all data types of a team for the IMDI Browser (media, annotations etc. ) • Long term solution • File access (user IDs and passwords) administrated by the teams DOBES Workshop Frankfurt April 2003 12
Access to Archive long-term DOBES Workshop Frankfurt April 2003 13
Archive Access Single Person the single person solution - the (almost) ideal world all in one single personal box DOBES Workshop Frankfurt April 2003 14
Archive Access Single Institute the single institute solution - the (almost) ideal world all in one single big box for an institute little more tricky - not all may access everything but one controlling instance fast networks available DOBES Workshop Frankfurt April 2003 15
Archive Access SI+Web the single institute solution with Internet Access the (almost) ideal world all in one single big box for all DOBES Workshop Frankfurt April 2003 much more tricky - not all may access everything still one controlling instance but can be faked and slow networks for video control delegation necessary 16
Archive Access DOBES Goal SOAS ? ? AILLA DOBES even more tricky - not all may access everything and everywhere? several controlling instances - need trust mechanisms control delegation even more necessary stability of paths? ? ? DOBES Workshop Frankfurt April 2003 17
DOBES Archive Access client management clients users & groups URID PID URL+ URID-Path mapping check whether user is allowed to access resource URID - ACL mapping check on valid ticket DOBES Workshop Frankfurt April 2003 streaming servers http servers resource domain 18
DOBES Archive Access essentials • online archive managers have write (delete) access (consistency, otherwise complex check-in & versioning system) • question: who has read access rights? • researchers/archivist define access policy - incl. management? ? ? • access per usage request (temporary) or person/group? • do we need person groups (team members, researchers, community members, …)? • access patterns per infotyp (MD, video, audio, annotations, others) • as was stated - everyone has to accept Co. C and copyright statement! • what about logo and watermarking? DOBES Workshop Frankfurt April 2003 19
Collaborations of DOBES Archivist DOBES Workshop Frankfurt April 2003 20
Collaborations I • DELAN (Digital Endangered Languages Archive Network) AILLA, DOBES, ELAR-SOAS, PARADISEC, … link to and support from UNESCO? • joint web portal with links general information, e. NEWS Archiv • Electronic Newsletter • Electronic Preprint Server • Advice+FAQ • Training & Revitalization etc • E & L, Co. C • Archive Access • Long-term Storage AILLA? DOBES LL? AILLA? SOAS PARADISEC ? DOBES • pressure group • joint fund raising activities • Adopt a Language activity ? ? DOBES Workshop Frankfurt April 2003 21
Collaborations II • E-Meld • joint developers workshop • joint CV editor by MPI • perhaps joint lexicon tool - interest on both sides (start after Easter with real person power at MPI) • close exchange with Arizona group about Ontology (Terry & Scott) • joint international workshop on lexicon schemas and registries • INTERA (Integrated European Language Resource Area) • integration of all metadata about all LR • automatic search for useful tools • ECHO (European Cultural Heritage Onlie) • additional language resources from archives into MD pool • interoperability issues with domains such as Ethnology, … • TYPOWEB (proposal to EU) DOBES Workshop Frankfurt April 2003 • project to define an open distributed typology framework • inclusion of DOBES and SOAS teams as testers (if they like) • a number of excellent typologists, field linguists and 2 technology p • Language. Web (proposal to EU) knowledge basis for lang tech • CHa. SE (proposal to EU) open tech framework for cultural heritage • data-GRID initiatives (to come) network for fast data exchange 22
DOBES Training Course DOBES Workshop Frankfurt April 2003 23
Training Courses • date 2 -6 June • everyone is invited - in particular new teams • all new teams showed interest - want much practical stuff • planning now content - any comment is welcome • will distribute the new schedule soon • “old” teams are invited to present topics / experience reports / … • open to SOAS teams • will carry out training courses in Germany together with GBS (Nikolaus Himmelmann) DOBES Workshop Frankfurt April 2003 24
- Slides: 24