IRs towards preservation services Steve Hitchcock Preserv Project

  • Slides: 15
Download presentation
IRs: towards preservation services Steve Hitchcock Preserv Project Intelligence Agents Multimedia Group, School of

IRs: towards preservation services Steve Hitchcock Preserv Project Intelligence Agents Multimedia Group, School of Electronics and Computer Science (ECS), Southampton University JISC Repositories & Preservation Programme, New Projects Briefing London, 24 -25 October 2006

Preserv: a rapid sketch 1. 2. 3. 4. 5. Preserv is investigating preservation strategies

Preserv: a rapid sketch 1. 2. 3. 4. 5. Preserv is investigating preservation strategies for institutional repositories Repositories (or repository software) do not do preservation IR preservation must begin with IR policy Digital preservation is difficult and best managed by specialists – preservation service providers We don’t know what a preservation service provider looks like

IR software • EPrints (Southampton) • DSpace (MIT) • Fedora (Cornell, Virginia) These are

IR software • EPrints (Southampton) • DSpace (MIT) • Fedora (Cornell, Virginia) These are among the most widely used applications for building IRs. It can be argued that these provide different degrees of support for preservation (in reverse order!).

IR software as preservation system “(N)o existing software application could serve on its own

IR software as preservation system “(N)o existing software application could serve on its own as a trustworthy preservation system. … Without the appropriate people, infrastructure, policies, and procedures, even the best preservation application cannot ensure preservation. ” Fedora and the Preservation of University Records Project: Reports and Findings, Tufts University and Yale University, Final Narrative Report to NHPRC, September 27, 2006 http: //dca. tufts. edu/features/nhprc/reports/index. html

Preservation actions • • Storage media Media refreshing Reformatting Backups and disaster recovery •

Preservation actions • • Storage media Media refreshing Reformatting Backups and disaster recovery • Environment • Audit • • • Security Preservation strategy Migration Emulation Technology preservation Records management, etc. From the JISC standards guidelines

Preservation as policy • IR preservation must start with policy • IR policy is

Preservation as policy • IR preservation must start with policy • IR policy is concerned with all aspects of repository management: content strategy (mandates!), collection policy, rights, etc. , see Open. DOAR Policies Tool http: //opendoar. org/tools/en/policies. php • Preservation strategy will emerge from this analysis • BUT IRs can only know the requirements and scale of the preservation task with a fully formed policy • To fulfil institutional requirements, the policy needs institutional backing (a true IR)

Survey of repository policies Selected repository administrators invited to participate, based on availability and

Survey of repository policies Selected repository administrators invited to participate, based on availability and size of ROAR profile Original test sites for profiling and survey included Oxford University, e-Prints Soton, ECS EPrints (Soton) Series of questions, based on analysis of preservation metadata for Preserv model EPrints DSpace Both Accepted/sent 22 11 2 Returned 4 2 13

Does the repository have any existing policy on preservation? Yes 1 No 18 Example

Does the repository have any existing policy on preservation? Yes 1 No 18 Example policy http: //www. rub. ruc. dk/rub/selvbetjening/projektbiblioteket_eng. shtml

Does the repository implement any preservation measures, internally or with external agents/services? Byte preservation

Does the repository implement any preservation measures, internally or with external agents/services? Byte preservation Y 8 N 9 No reply 2 Transformation Y 3 N 14 No reply 2 Rendering Y 1 N 13 No reply 5 Emulation Y 0 N 14 No reply 5 Other: backup, mirroring, geographic cluster backup Partnerships: Sherpa-DP, Meta. Archive NDIIPP, dissertation copies at German National Library

Does the repository have a policy on submission file formats? Y 11 N 4

Does the repository have a policy on submission file formats? Y 11 N 4 No reply 4 • • prefer PDF / DOC / PPT / HTML recommend using PDF or HTML PDF (Sherpa policy) accept all formats, text documents should be at least be pdf preferred pdf/a • Use DSpace supported, known, and unknown formats (x 3) • Rendering software must be free, i. e. Acrobat, text, Post. Script, HTML

Preserv preservation service provider schematic

Preserv preservation service provider schematic

Format profiling using PRONOM and ROAR http: //archives. eprints. org/

Format profiling using PRONOM and ROAR http: //archives. eprints. org/

Preservation services: passive to active 1. 2. Passive preservation (aka bitstream preservation) Active preservation

Preservation services: passive to active 1. 2. Passive preservation (aka bitstream preservation) Active preservation 1. Characterisation 2. Preservation planning 3. Preservation action

Moving to a new concept: distributed preservation services? • We need to look beyond

Moving to a new concept: distributed preservation services? • We need to look beyond the idea of a ‘black box’ preservation service • Services might be based on lightweight, interacting distributed Web services • Who will provide these services? • What coordination is required between services? Is that where client-facing service providers will emerge? • What services can the market sustain? See DPC Featured project interview: Preserv, 25 July 2006 http: //www. dpconline. org/graphics/join/preserv. html

Summary: policy before preservation 1. 2. 3. 4. Repositories are looking for guidance on

Summary: policy before preservation 1. 2. 3. 4. Repositories are looking for guidance on preservation Repositories embrace different institutional, cultural and social constraints that will shape policy, including preservation, when they get round to defining it! Proposed a hierarchical (in terms of cost) series of preservation service models so repositories can choose which one suits This approach may be superseded, or supplemented, by distributed preservation services