MODELLING THE DIGITAL PRESERVATION COSTS Paul Wheatley Digital

  • Slides: 16
Download presentation
MODELLING THE DIGITAL PRESERVATION COSTS Paul Wheatley Digital Preservation Manager British Library

MODELLING THE DIGITAL PRESERVATION COSTS Paul Wheatley Digital Preservation Manager British Library

Summary n Overview of the model: n Aims n Development process n Model n

Summary n Overview of the model: n Aims n Development process n Model n Results n Evaluation n Conclusions 2 2

Scope n Acquisition n Ingest n Metadata n Storage n Access n Preservation 3

Scope n Acquisition n Ingest n Metadata n Storage n Access n Preservation 3 3

Background aims n Previous work (see Final Report): n National Archief, Digital Bewaring –

Background aims n Previous work (see Final Report): n National Archief, Digital Bewaring – full costing/audit approach n Oltmans, Kol – lifecycle and strategies n Key aims: n Make the first major step in defining and estimating the lifecycle cost of digital preservation activities. n Propose a model for comment by the wider preservation community n Enable the LIFE Case Studies to be compared and contrasted by providing some cost estimates for “P” in the Lifecycle Model. n Attempt to identify the scale of preservation costs. Are they dramatically high as suggested previously by many in the preservation community or are they more achievable as suggested recently (see Rusbridge, C, “Excuse Me. . . Some Digital Preservation Fallacies? ”)? 4 4

Development process n Key cost factors, experimentation, iterative development and refinement n Based on

Development process n Key cost factors, experimentation, iterative development and refinement n Based on evidence or indications of trends where possible n Editable inputs where key estimation or assumptions made n Cost component review n Application of draft model, refinement of inputs n Team review, refinement of model weaknesses 5 5

The Generic LIFE Preservation Model Preservation = t * TEW + (t / ULE

The Generic LIFE Preservation Model Preservation = t * TEW + (t / ULE + PON) * (CRS + UME + PPA + QAA) Expansion of calculated components: • ULE – Unaided Life Expectancy of a Format = BLE + 0. 1*t • CRS – Cost of new rendering solution = (1 - PTA) * TDC * FCX + PTA * COA • PPA – Performing preservation action = PON * (SCM + n * HVM) • QAA – Quality Assurance = n * BCT * FCX • PTA – Proportion of Tool Availability = STA(1 -t/20)+ETA(t/20) Expansion of scaling components: • PON – Proportion of normalisation = 0. 4 • FCX - Format complexity (e. g. JPEG = 0. 2, WMF = 0. 4, PDF = 0. 6, Word = 0. 8) Expansion of cost component inputs: • HVM – High volume migration cost per object = £ 0. 05 • BCT – Base cost of testing a preservation action per object = £ 0. 17 • UME – Update Metadata = 2 metadata officer weeks @ £ 30 k annual salary = £ 1250 • TDC – Tool development cost = 24 programmer months @ £ 30 k annual salary - £ 60000 • COA – Cost of available tool = £ 1500 • TEW - Technology Watch = 1 metadata officer week @ £ 30 k annual salary = £ 625 • BLE - Base life expectancy = 8 (years) • STA – Starting tool availability = 0. 5 • ETA – Ending tool availability = 0. 9 • SCM – Setup cost of migration = £ 340 6 6

The Generic LIFE Preservation Model : key elements explained Preservation cost of n objects

The Generic LIFE Preservation Model : key elements explained Preservation cost of n objects of a particular format for the period 0 to t. Eg. 20000 objects of the GIF format for a period of 10 years. Preservation = t * TEW + (t / ULE + PON) * (CRS + UME + PPA + QAA) Preservation = Tech Watch + Frequency of action * Preservation action n. Monitoring formats and software for obsolescence Cost of n. The number of Update Preservation preservation actions within metadata n. Updating and managing tool the time period calculated metadata (Representation Information). Perform preservation action Q/A 7 7

The occurrence of costs (1 st detailed sample of the model) Preservation = Tech

The occurrence of costs (1 st detailed sample of the model) Preservation = Tech Watch + Frequency of action * Preservation action Example : FCLA Action Plans http: //www. fcla. edu/digital. Archive/ Series of small technology watch events and spikes of preservation activity at increasing intervals Base life expectancy = 8 years Increases by a year every decade 8 8

Complexity of file formats (2 nd detailed sample of the model) Preservation = Format

Complexity of file formats (2 nd detailed sample of the model) Preservation = Format Complexity = Frequency Tech Preservation action of action * Watch + Category Complexity Examples Simple 0. 1 ASCII, Unicode Bitmap • Size Mark-up • Complexity Update • Vector Proprietary metadata • Open Multimedia • Standardised Document 0. 2 JPEG, GIF Complex 0. 3 HTML Cost XML, of Perform Preservation preservation 0. 4 EMF, Draw tool action 0. 6 MPEG 3, WAV Q/A 0. 8 Word, PDF 1 Oracle database dump 9 9

Preservation tool cost (3 rd detailed sample of the model) ) Cost of developing

Preservation tool cost (3 rd detailed sample of the model) ) Cost of developing a new tool + PTA = (1 - Cost of acquiring an existing tool Proportion Preservation = t * TEW + (t. Frequency / ULE + PON) * (CRS + UME + PPA + QAA) of tool Tech Average proportion Preservation action + (t/20) Preservation + * Availability == Watch (1 -t/20) of action across the time period (PTA) Tool Estimated as 24 programmer Development = months @ 30 k annual salary ETA Cost (TDC) (£ 60000) Format Cost of. STA Perform Update Complexity Preservation preservation Q/A Cost metadata of ETA = 0. 9 Tool (CRS) action Available = Estimated as £ 1500 tool STA = 0. 5 10 10

Estimated costs using the model File Format Comple xity Numb er of objects Frequ

Estimated costs using the model File Format Comple xity Numb er of objects Frequ ency of pres action GIF 0. 2 2250 79 1. 51 File Format Technology watch Preservation tool cost GIF £ 6, 250 £ 7, 027 Estimated preservation costs for GIF files in the Web Archiving Case Study Metadata Preservation action Quality assurance Total cost (over 10 years) £ 1, 889 £ 7, 008 £ 11, 564 £ 33, 738 Case study name Sub category Year 1 VDEP e-monographs £ 0. 89 £ 1. 45 4% VDEP e-serials £ 10 £ 27 2% £ 425 £ 8509 62% Web archiving Year 10 Percentage of total lifecycle cost Comparison of average object preservation costs across the Case Studies 11 11

Model outputs: WA Case Study, percentage breakdown Breakdown of complete preservation costs over time

Model outputs: WA Case Study, percentage breakdown Breakdown of complete preservation costs over time in the WA Case Study Quality assurance Preservation action Metadata Tool cost Technology watch 1 5 10 Time period (years) 20 12 12

Self evaluation of the model Evaluation against key aims: n Make the first major

Self evaluation of the model Evaluation against key aims: n Make the first major step in defining and estimating the lifecycle cost of digital preservation activities. n Propose a model for comment by the wider preservation community n Enable the LIFE Case Studies to be compared and contrasted by providing some cost estimates for “P” in the Lifecycle Model. n Attempt to identify the scale of preservation costs. Are they dramatically high as suggested previously by many in the preservation community or are they more achievable as suggested recently (see Rusbridge, C, “Excuse Me. . . Some Digital Preservation Fallacies? ”)? 13 13

Further work and refinement n Refinement based on real cost data, removal of assumptions

Further work and refinement n Refinement based on real cost data, removal of assumptions n Level of detail n Format complexity n Re-ingest n More detailed discussion in the Final Report… 14 14

Summary and conclusions n Estimating the cost is not easy but appears to be

Summary and conclusions n Estimating the cost is not easy but appears to be possible! n Provides a useful perspective on performing preservation n Focuses on achieving cost effective preservation 15 15

Finally… Two appeals to the audience: n Please cost, record and publish your preservation

Finally… Two appeals to the audience: n Please cost, record and publish your preservation work n Provide comment on the preservation model: Questions, comments, evaluation: paul. wheatley@bl. uk 16 16