a centre of expertise in data curation and

  • Slides: 41
Download presentation
a centre of expertise in data curation and preservation The DCC Curation Lifecycle Model

a centre of expertise in data curation and preservation The DCC Curation Lifecycle Model Sarah Higgins Ross Harvey with graphics advice from Chris Blackall Funded by: This work is licensed under the Creative Commons Attribution-Non. Commercial-Share. Alike 2. 5 UK: Scotland License. To view a copy of this license, visit http: //creativecommons. org/licenses/by-ncsa/2. 5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5 th Floor, San Francisco, California, 94105, USA. DCC Curation Lifecycle Model

DCC Curation Lifecycle Model The Curation Lifecycle The DCC Curation Lifecycle Model provides a

DCC Curation Lifecycle Model The Curation Lifecycle The DCC Curation Lifecycle Model provides a graphical high level overview of the stages required for successful curation and preservation of data from initial conceptualisation or receipt. The model can be used to plan activities within an organisation or consortium to ensure all necessary stages are undertaken, each in the correct sequence. • www. dcc. ac. uk

DCC Curation Lifecycle Model Using the DCC Curation Lifecycle Model The model enables: •

DCC Curation Lifecycle Model Using the DCC Curation Lifecycle Model The model enables: • mapping of granular functionality • definition of roles and responsibilities • building frameworks of standards and technologies to implement • identification of additional steps required • identification of actions which are not required • ensuring adequate documentation of processes and policies • www. dcc. ac. uk

DCC Curation Lifecycle Model • www. dcc. ac. uk

DCC Curation Lifecycle Model • www. dcc. ac. uk

DCC Curation Lifecycle Model • www. dcc. ac. uk

DCC Curation Lifecycle Model • www. dcc. ac. uk

DCC Curation Lifecycle Model Data (Digital Objects or Databases) Data, any information in binary

DCC Curation Lifecycle Model Data (Digital Objects or Databases) Data, any information in binary digital form, is at the centre of the Curation Lifecycle. This includes: • simple digital objects • complex digital objects • databases • www. dcc. ac. uk

DCC Curation Lifecycle Model Data (Digital Objects or Databases) • simple digital objects •

DCC Curation Lifecycle Model Data (Digital Objects or Databases) • simple digital objects • discrete digital items, such as textual files, images or sound files, along with their related identifiers and metadata • complex digital objects • discrete digital objects, made by combining a number of other digital objects, such as websites • databases • structured collections of records or data stored in a computer system • www. dcc. ac. uk

DCC Curation Lifecycle Model Full Lifecycle Actions Description and Representation Information Assign administrative, descriptive,

DCC Curation Lifecycle Model Full Lifecycle Actions Description and Representation Information Assign administrative, descriptive, technical, structural and preservation metadata, using appropriate standards, to ensure adequate description and control over the long-term. Collect and assign representation information required to understand render both the digital material and the associated metadata. • www. dcc. ac. uk

DCC Curation Lifecycle Model Full Lifecycle Actions Description Information (Metadata) • • persistently identifies

DCC Curation Lifecycle Model Full Lifecycle Actions Description Information (Metadata) • • persistently identifies data and maintains reliable links to them clearly describes what they are clearly identifies technical information needed to use data identifies who is responsible for their management and preservation describes what can be done to them describes what is needed to represent them at the required level of fidelity records their history and documents their authenticity allows users to understand their context and relationship to other objects. • www. dcc. ac. uk

DCC Curation Lifecycle Model Full Lifecycle Actions Representation Information • • • Structure Information:

DCC Curation Lifecycle Model Full Lifecycle Actions Representation Information • • • Structure Information: describes the format and data structure concepts to be applied to the bitstream, which result in more meaningful values like characters or number of pixels. Semantic Information: this is needed on top of the structure information. If the digital object is interpreted by the structure information as a sequence of text characters, the semantic information should include details of which language is being expressed. Other Representation Information: includes information about relevant software, hardware and storage media, encryption or compression algorithms, and printed documentation. • www. dcc. ac. uk

DCC Curation Lifecycle Model Full Lifecycle Actions Preservation Planning Plan for preservation throughout the

DCC Curation Lifecycle Model Full Lifecycle Actions Preservation Planning Plan for preservation throughout the curation lifecycle of digital material. This would include plans for management and administration of all curation lifecycle actions. • www. dcc. ac. uk

DCC Curation Lifecycle Model Full Lifecycle Actions Preservation Planning – ensure future data access

DCC Curation Lifecycle Model Full Lifecycle Actions Preservation Planning – ensure future data access Digital preservation: • is a set of managed activities • aims at ensuring the bit-stream is maintained • aims at ensuring that data are accessible • is concerned with maintaining bit streams and ensuring accessibility for a definable period of time • www. dcc. ac. uk

DCC Curation Lifecycle Model Full Lifecycle Actions Preservation Planning – ensure longevity, integrity, accessibility

DCC Curation Lifecycle Model Full Lifecycle Actions Preservation Planning – ensure longevity, integrity, accessibility • • • longevity • as long as required - longer than the original access system integrity • copy data to a reliable digital storage system • ongoing management - data security, backups, error checking • refresh data and maintain multiple copies of the bit stream • ensure you have preservation action rights. accessibility • assign persistent identifiers • add sufficient metadata and representation information • choose limited open file formats • monitor technical developments • retain and manage the original bit stream • www. dcc. ac. uk

DCC Curation Lifecycle Model Full Lifecycle Actions Community Watch and Participation Maintain a watch

DCC Curation Lifecycle Model Full Lifecycle Actions Community Watch and Participation Maintain a watch on appropriate community activities, and participate in the development of shared standards, tools and suitable software. • www. dcc. ac. uk

DCC Curation Lifecycle Model Full Lifecycle Actions Community Watch and Participation – benefits of

DCC Curation Lifecycle Model Full Lifecycle Actions Community Watch and Participation – benefits of collaboration • • • access to a wider range of expertise access to tools and systems that might otherwise be unavailable encouragement for other stakeholders to take preservation seriously shared influence on R&D of standards and practices attraction of resources and other support for well-coordinated programmes at a regional, national or sectoral level shared influence on agreements with producers increased coverage of preserved materials better planning to reduce wasted effort shared development costs shared learning opportunities UNESCO, Guidelines for the Preservation of Digital Heritage, 2003 • www. dcc. ac. uk

DCC Curation Lifecycle Model Full Lifecycle Actions Curate and Preserve Be aware of, and

DCC Curation Lifecycle Model Full Lifecycle Actions Curate and Preserve Be aware of, and undertake management and administrative actions planned to promote curation and preservation throughout the curation lifecycle. • www. dcc. ac. uk

DCC Curation Lifecycle Model Sequential Actions Curate and Preserve – the need for digital

DCC Curation Lifecycle Model Sequential Actions Curate and Preserve – the need for digital curation and • • preservation immense quantities of data are being generated the quantities are increasing the scientific, scholarly and research communities increasingly rely on networked computing data are at risk from: • • • technological obsolescence fragility lack of understanding / application of a good practice insufficient resources inappropriate organisational infrastructure • www. dcc. ac. uk

DCC Curation Lifecycle Model Sequential Actions Curate and Preserve - plan for digital curation

DCC Curation Lifecycle Model Sequential Actions Curate and Preserve - plan for digital curation and preservation • Digital curation techniques address the problems outlined, including: • maintenance of data • adding value to data for current and future use • Be aware of management and administrative actions needed to promote curation and preservation throughout the lifecycle • Undertake management and administrative actions needed to promote curation and preservation throughout the lifecycle • www. dcc. ac. uk

DCC Curation Lifecycle Model Sequential Actions Conceptualise Conceive and plan the creation of data,

DCC Curation Lifecycle Model Sequential Actions Conceptualise Conceive and plan the creation of data, including capture method and storage options. • www. dcc. ac. uk

DCC Curation Lifecycle Model Sequential Actions Conceptualise - plan with digital curation in mind

DCC Curation Lifecycle Model Sequential Actions Conceptualise - plan with digital curation in mind • develop robust workflow, processes and documentation • choose appropriate, existing open standards - interoperability • capture and store data in curation-friendly file formats (open source) • record sufficient information during data capture to assist with ongoing use • scrupulously identify files • store data on appropriate media • identify a safe place for storage (e. g. a trusted archive) and make sure that archive will take your data • identify access methods • identify legal framework • www. dcc. ac. uk

DCC Curation Lifecycle Model Sequential Actions Create or Receive Create data including administrative, descriptive,

DCC Curation Lifecycle Model Sequential Actions Create or Receive Create data including administrative, descriptive, structural and technical metadata. Preservation metadata may also be added at the time of creation. Receive data, in accordance with documented collecting policies, from data creators, other archives, repositories or data centres, and if required assign appropriate metadata. • www. dcc. ac. uk

DCC Curation Lifecycle Model Sequential Actions Create or Receive – ensure data are curation

DCC Curation Lifecycle Model Sequential Actions Create or Receive – ensure data are curation ready • • of high quality well structured adequately documented interoperable authentic (it is what it claims to be) accurate (it hasn’t been tampered with) renderable (it can be used in the ways for which it was intended, or viewed as originally intended) in a form that best ensures its longevity • www. dcc. ac. uk

DCC Curation Lifecycle Model Sequential Actions Appraise and Select Evaluate data and select for

DCC Curation Lifecycle Model Sequential Actions Appraise and Select Evaluate data and select for longterm curation and preservation. Adhere to documented guidance, policies or legal requirements. • www. dcc. ac. uk

DCC Curation Lifecycle Model Sequential Actions Appraise and Select – develop robust policies How

DCC Curation Lifecycle Model Sequential Actions Appraise and Select – develop robust policies How long do we want to keep the data? • in terms of changes of technology • in terms an organisation’s business requirements • in terms of user requirements (e. g. as evidence to verify conclusions derived from research). How long do we need to keep the data? • assess benefits and risks of keeping/not keeping data • what are the consequences of not keeping the data? • how much would it cost to recreate it in the future? • is it even possible to recreate it in the future? • www. dcc. ac. uk

DCC Curation Lifecycle Model Occasional Actions Dispose of data, which has not been selected

DCC Curation Lifecycle Model Occasional Actions Dispose of data, which has not been selected for long-term curation and preservation in accordance with documented policies, guidance or legal requirements. Typically data may be transferred to another archive, repository, data centre or other custodian. In some instances data is destroyed. The data’s nature may, for legal reasons, necessitate secure destruction. • www. dcc. ac. uk

DCC Curation Lifecycle Model Occasional Actions Dispose – transfer or destruction? • transfer •

DCC Curation Lifecycle Model Occasional Actions Dispose – transfer or destruction? • transfer • if no longer relevant for business function but useful to someone else • for safe keeping – institutional archive • for greater accessibility – more widely accessible data archive • secure destruction – prevent re-use or reconstruction • sensitive data no longer relevant for business function • www. dcc. ac. uk

DCC Curation Lifecycle Model Sequential Actions Ingest Transfer data to an archive, repository, data

DCC Curation Lifecycle Model Sequential Actions Ingest Transfer data to an archive, repository, data centre or other custodian. Adhere to documented guidance, policies or legal requirements. • www. dcc. ac. uk

DCC Curation Lifecycle Model Sequential Actions Ingest – prepare data for access and long-term

DCC Curation Lifecycle Model Sequential Actions Ingest – prepare data for access and long-term storage • • • assign a persistent identifier check the data does not contain malicious spyware or malware creating fixity values eg: digital signature, hash value, checksum) for integrity checking confirm technical details eg: file format, MIME type associate with description and representation information • www. dcc. ac. uk

DCC Curation Lifecycle Model Sequential Actions Preservation Action Undertake actions to ensure longterm preservation

DCC Curation Lifecycle Model Sequential Actions Preservation Action Undertake actions to ensure longterm preservation and retention of the authoritative nature of data. Preservation actions should ensure that data remains authentic, reliable and usable while maintaining its integrity. Actions include data cleaning, validation, assigning preservation metadata, assigning representation information and ensuring acceptable data structures or file formats. • www. dcc. ac. uk

DCC Curation Lifecycle Model Sequential Actions Preservation Action – specific necessary actions • •

DCC Curation Lifecycle Model Sequential Actions Preservation Action – specific necessary actions • • • keep the original data bit stream as well as any ‘preservation version’ for future proofing clean and validate data, to ensure they can be managed and re-used over time add or extract high quality preservation metadata and representation information to increase potential for discovery, re-use and preservation ensure acceptable data structures or file formats (eg non-proprietary, well-documented) to increase the chance of future recoverability apply good data management practices implement secure storage and institutional or organisational continuity Based on Lord, P and Macdonald, A, e. Science Curation Report, 2003 • www. dcc. ac. uk

DCC Curation Lifecycle Model Sequential Actions Preservation Action – implement preservation methods • •

DCC Curation Lifecycle Model Sequential Actions Preservation Action – implement preservation methods • • • Migration – transformats as technologies change Emulation – keep original data and application software and create programs to emulate their behaviour on contemporary architectures Formal descriptions – encode behaviours of original application, at creation, in a format understood by a Universal Virtual Computer (a platform independent layer between hardware and software) to allow reconstitution in original form. Digital archaeology – future recovery as needed or exploratory basis Computer museums – archive whole systems: hardware and software Based on Lord, P and Macdonald, A, e. Science Curation Report, 2003 • www. dcc. ac. uk

DCC Curation Lifecycle Model Sequential Actions Preservation Action – automate with tools • identifying

DCC Curation Lifecycle Model Sequential Actions Preservation Action – automate with tools • identifying data (where it is located, what formats it is in) • format validation, format registries, obsolescence tools • describing data (automated metadata creation) • technical metadata extraction, conversion to xml schema • manipulating data (data management, data storage, repositories) • normalising and encapsulation tools • preserving data (migration) • web archiving tools, emulation tools, preservation metadata extraction tools • • data registration (ingest) documentation of commonly used terms and concepts • thesauri, word lists, ontologies • rights management and access control • www. dcc. ac. uk

DCC Curation Lifecycle Model Occasional Actions Reappraise Return data which fails validation procedures for

DCC Curation Lifecycle Model Occasional Actions Reappraise Return data which fails validation procedures for further appraisal and reselection. • www. dcc. ac. uk

DCC Curation Lifecycle Model Occasional Actions Migrate data to a different format. This may

DCC Curation Lifecycle Model Occasional Actions Migrate data to a different format. This may be done to accord with the storage environment or to ensure the data’s immunity from hardware or software obsolescence. • www. dcc. ac. uk

DCC Curation Lifecycle Model Occasional Actions Migrate – for preservation storage • File formats

DCC Curation Lifecycle Model Occasional Actions Migrate – for preservation storage • File formats for long-term preservation should be: non-proprietary, open source and well documented • This facilitates: curation, future access, reuse and future migrations Examples • JPEG – digital image thumbnails • TIFF – high quality digital images • PDF/A-1 – documents – with look and feel (ISO 19005 -1, Document management – electronic document file formats for long-term preservation) • HTML – web pages • XML – data or text • www. dcc. ac. uk

DCC Curation Lifecycle Model Sequential Actions Store the data in a secure manner adhering

DCC Curation Lifecycle Model Sequential Actions Store the data in a secure manner adhering to relevant standards. • www. dcc. ac. uk

DCC Curation Lifecycle Model Sequential Actions Store – ensure access and continuity • storage

DCC Curation Lifecycle Model Sequential Actions Store – ensure access and continuity • storage facilities should: • ensure secure and reliable storage over time • meet the requirements of relevant standards • access for use and reuse • storage administration should: • • • be committed to continued maintenance of digital objects ensure adequate and appropriate finance and staffing negotiate the requisite contractual and legal rights fulfil legal responsibilities develop an effective and efficient policy framework develop a strategic program for preservation planning and action • www. dcc. ac. uk

DCC Curation Lifecycle Model Sequential Actions Access, Use and Reuse Ensure that data is

DCC Curation Lifecycle Model Sequential Actions Access, Use and Reuse Ensure that data is accessible to both designated users and reusers, on a day-to-day basis. This may be in the form of publicly available published information. Robust access controls and authentication procedures may be applicable. • www. dcc. ac. uk

DCC Curation Lifecycle Model Sequential Actions Access, Use and Reuse – ensure access and

DCC Curation Lifecycle Model Sequential Actions Access, Use and Reuse – ensure access and continuity • ensuring data can be discovered by applying standards • metadata standards • allow interoperability • • ensure legal permissions allow data to be used and reused ensure legal restrictions on the use and reuse of data are adhered to provide tools for collaboration ensure access controls and authentication procedures restrict access to authorised users • www. dcc. ac. uk

DCC Curation Lifecycle Model Sequential Actions Transform Create new data from the original, for

DCC Curation Lifecycle Model Sequential Actions Transform Create new data from the original, for example • By migration into a different format. • By creating a subset, by selection or query, to create newly derived results, perhaps for publication. • www. dcc. ac. uk

DCC Curation Lifecycle Model Sequential Actions Transform – new uses for curated data •

DCC Curation Lifecycle Model Sequential Actions Transform – new uses for curated data • verification of post-analysis results • the basis of further experiments • cumulative analysis. • foundation for new research, science, knowledge and discovery • reliable extension of research The curation lifecycle begins again: new data created by the Transform action is input into the Create or Receive action. • www. dcc. ac. uk