Digital Creation Preservation 101 THE DIGITAL CREATION PRESERVATION

  • Slides: 78
Download presentation
Digital Creation & Preservation 101 THE DIGITAL CREATION & PRESERVATION WORKING GROUP UMASS AMHERST

Digital Creation & Preservation 101 THE DIGITAL CREATION & PRESERVATION WORKING GROUP UMASS AMHERST LIBRARIES NOVEMBER 12, 2009

Digital Creation & Preservation Working Group Members Meghan Banach, Chair MJ Canavan Yuan Li

Digital Creation & Preservation Working Group Members Meghan Banach, Chair MJ Canavan Yuan Li Aaron Rubinstein Brian Shelburne Kelcy Shepherd

Digital Creation & Preservation Working Group Goals: Make recommendations on strategies, policies, and best

Digital Creation & Preservation Working Group Goals: Make recommendations on strategies, policies, and best practices for creating and preserving digital collections Raise awareness of digital preservation issues Ensure best practices through consultations and training

Preservation & Creation Preservation strategies start at the beginning Planning is everything What to

Preservation & Creation Preservation strategies start at the beginning Planning is everything What to digitize? How to give your users access?

What to digitize? Things to consider: Copyright status Significance Current and potential users Organization

What to digitize? Things to consider: Copyright status Significance Current and potential users Organization and descriptive metadata Relationship to other collections Formats/Technologies

How to digitize? Things to consider when planning a digitization project: Standards Best practices

How to digitize? Things to consider when planning a digitization project: Standards Best practices Budgets

How to digitize? - Standards What are standards? Why use them? Open standards vs.

How to digitize? - Standards What are standards? Why use them? Open standards vs. proprietary “I’m told there are better programs, but I’m also told there are better alphabets. ” – Christopher F. Buckley on Word. Star

How to digitize – Standards cont…

How to digitize – Standards cont…

How to digitize? – Standards… The fundamental goal of standards: Interoperability and data portability

How to digitize? – Standards… The fundamental goal of standards: Interoperability and data portability

Preservation & Creation Good digital creation = good preservation potential GIGO = Garbage In,

Preservation & Creation Good digital creation = good preservation potential GIGO = Garbage In, Garbage Out In preservation: Garbage Now, Garbage Later Best Practices ensure best odds for preservation

Best Practices Why use Best Practices? Establish consistency Get things right the first time

Best Practices Why use Best Practices? Establish consistency Get things right the first time

Best Practices Why use Best Practices? Learn from others’ past mistakes Allow data to

Best Practices Why use Best Practices? Learn from others’ past mistakes Allow data to work well with other data

Best Practices - Standards Who establishes Best Practices? Digital Preservation Group working to establish

Best Practices - Standards Who establishes Best Practices? Digital Preservation Group working to establish similar document for our own materials

Best Practices – Standards samples

Best Practices – Standards samples

Digital Creation– Scan uses Desired use of the scan Access to the content only

Digital Creation– Scan uses Desired use of the scan Access to the content only Screen viewing Print output OCR Some or all of the above

Digital Creation – Common formats TIFF - Tagged Image File Format Controlled by Adobe

Digital Creation – Common formats TIFF - Tagged Image File Format Controlled by Adobe Systems Basically unchanged since 1992 PNG – Portable Network Graphics Non-proprietary Gaining in popularity PDF – Portable Document Format Popular format Controlled by Adobe, but now open standard JPEG – Joint Photographic Experts Group Lossy Raw formats Proprietary Constantly changed

Digital Creation- Original formats Format of the material to be digitized Print materials (Books,

Digital Creation- Original formats Format of the material to be digitized Print materials (Books, maps, music, etc. ) Manuscripts (handwritten or typewritten)

Digital Creation- Original formats Photographs Film

Digital Creation- Original formats Photographs Film

Digital Creation- Original formats 3 -D objects Graphics

Digital Creation- Original formats 3 -D objects Graphics

Digital Creation- Format standards Text Film Master Access Thumbnail File Format TIFF JPEG Bit

Digital Creation- Format standards Text Film Master Access Thumbnail File Format TIFF JPEG Bit Depth 1 bitonal 8 to 16 bit grayscale 48 bit color 8 bit grayscale 24 bit color Adjust scan resolution to produce a minimum pixel measurement across the long dimension of 6, 000 lines for 1 bit files and 4, 000 lines for 8 to 16 bit files 150 – 200 PPI 144 PPI 4000 to 6000 pixels across the long dimension 600 pixels across the long dimension Spatial Resolution Spatial Dimensions Master Access Thumbnail File Format Bit Depth TIFF 16 bit grayscale 48 bit color JPEG 8 bit grayscale 24 bit color Spatial Resolution to be calculated from actual image format and/or dimensions – approx. 2800 PPI for 35 mm originals, ranging to approx. 600 PPI for 8 x 10 originals. 4000 to 6000 pixels across the long dimension of image area, depending on size of original and excluding mounts and borders JPEG 8 bit grayscale 24 bit color 150 – 200 PPI 600 pixels across the long dimension 150 to 200 pixels across the long dimension Spatial Dimensions 150 to 200 pixels across the long dimension 144 PPI

Digital Creation– Image types Master file (original scan) Create multiple files for various uses

Digital Creation– Image types Master file (original scan) Create multiple files for various uses Screen-size file (corrected, size reduced) Working file (corrected, fullsized scan) Thumbnail image (corrected, very small)

Digital Creation– Image types

Digital Creation– Image types

Embedded Metadata Data can be carried within images Most photos carry basic data Right-click

Embedded Metadata Data can be carried within images Most photos carry basic data Right-click on a photo and choose “Properties”

Embedded Metadata - EXIF More complex data can be added using different metadata standards

Embedded Metadata - EXIF More complex data can be added using different metadata standards EXIF – Exchangeable Image File

Embedded Metadata - IPTC – International Press Telecommunications Council

Embedded Metadata - IPTC – International Press Telecommunications Council

Embedded Metadata - XMP – Extensible Metadata Platform

Embedded Metadata - XMP – Extensible Metadata Platform

Access How will users access your stuff?

Access How will users access your stuff?

Access - Metadata’s role : Tells us what’s inside Glues together digital objects Makes

Access - Metadata’s role : Tells us what’s inside Glues together digital objects Makes data portable

Access – Choosing systems Things to consider when choosing access systems Resources Control “Roll

Access – Choosing systems Things to consider when choosing access systems Resources Control “Roll your own” vs. “Out of the box”

Access – Proprietary vs. Open Source What is proprietary software? What is open source

Access – Proprietary vs. Open Source What is proprietary software? What is open source software? *[email protected]*&@$!!* [email protected]!!***&[email protected]

Access – Open Source vs. Proprietary Pros and cons: Open source pros Free (as

Access – Open Source vs. Proprietary Pros and cons: Open source pros Free (as in beer and as in freedom) Community based Customizable No hidden secrets Use open standards Open source cons Support Can require advanced technical knowledge Proprietary pros Comes with support Often “out of the box” and easy to set up ILS vendors sell products that can integrate with OPACs and other ILS modules Proprietary cons $$$$$$ Slow to implement functionality Often use proprietary standards “If you break this seal…”

Access – Example software

Access – Example software

Access – Example software…

Access – Example software…

Access – Example software…

Access – Example software…

Access – Example software…

Access – Example software…

In Summary Digital preservation is impossible without good creation practices Selection Software Standards Access

In Summary Digital preservation is impossible without good creation practices Selection Software Standards Access Best practices

Definition of Digital Preservation: Refers to the series of managed activities necessary to ensure

Definition of Digital Preservation: Refers to the series of managed activities necessary to ensure continued access to digital materials for as long as necessary Long-term preservation Medium-term preservation Short-term preservation Definition from: The Preservation Management of Digital Material Handbook, Digital Preservation Coalition

The technology of writing throughout history:

The technology of writing throughout history:

Digital Preservation Issues: Technological obsolescence Hardware Drivers Software File formats Media Physical threats Digital

Digital Preservation Issues: Technological obsolescence Hardware Drivers Software File formats Media Physical threats Digital rights management/copyright Loss of context

Hardware obsolescence:

Hardware obsolescence:

Driver obsolescence:

Driver obsolescence:

Software obsolescence:

Software obsolescence:

File format obsolescence:

File format obsolescence:

Media obsolescence:

Media obsolescence:

Real life problem:

Real life problem:

Physical threats: Improper storage environment (temperature, humidity, light, dust) Natural disaster (fire, flood, earthquake)

Physical threats: Improper storage environment (temperature, humidity, light, dust) Natural disaster (fire, flood, earthquake) Infrastructure failure (plumbing, electrical, climate control) Sabotage (theft, vandalism, malicious modification/erasure, viruses, terrorist attack etc) Human error (including improper handling) Overuse

Copyright/Digital Rights Management:

Copyright/Digital Rights Management:

Copyright/Digital Rights Management: Does the Library own the copyright for the material it is

Copyright/Digital Rights Management: Does the Library own the copyright for the material it is trying to preserve? Copyright law grants certain rights to the copyright holder including: The exclusive right to prepare derivative works based upon the copyrighted work The exclusive right to distribute copies of the copyrighted work to the public The exclusive right to perform some copyrighted works publicly The exclusive right to display some copyrighted works publicly The exclusive right to in the case of sound recordings, to perform the copyrighted work publicly by means of a digital audio transmission in the United States The exclusive right to control access to a work protected by the use of a technological measure

Loss of context:

Loss of context:

Digital Preservation Strategies Preservation Metadata Replication Refreshing Migration Emulation Digital Archaeology

Digital Preservation Strategies Preservation Metadata Replication Refreshing Migration Emulation Digital Archaeology

Preservation Metadata Information needed to access, use, and understand digital resources over time Examples:

Preservation Metadata Information needed to access, use, and understand digital resources over time Examples: Provenance or ownership history Documentation of any changes made to the resource during digitization or preservation Technical information needed to preserve the resource Documentation of actions taken to preserve the resource Rights information

Replication Bitstream copying

Replication Bitstream copying

Refreshing Copying digital information from one long term storage medium to another of the

Refreshing Copying digital information from one long term storage medium to another of the same type

Migration To copy or convert data from one technology to another

Migration To copy or convert data from one technology to another

Emulation Reproducing the performance of one computer system on another computer system

Emulation Reproducing the performance of one computer system on another computer system

Digital Archaeology Rescuing digital content from damaged media or from obsolete or damaged hardware

Digital Archaeology Rescuing digital content from damaged media or from obsolete or damaged hardware and software environments

Disaster Recovery – Business Continuity Security – physical and virtual Backups - data, OS,

Disaster Recovery – Business Continuity Security – physical and virtual Backups - data, OS, applications incremental, full Offsite (secure) storage - tape

Physical Security

Physical Security

Servers and AC Unit

Servers and AC Unit

More Power

More Power

Virtual Security Virus Protection Patching & updating

Virtual Security Virus Protection Patching & updating

Virtual Security Firewalls Encryption

Virtual Security Firewalls Encryption

Virtual security Monitor Log Files CPU Usage

Virtual security Monitor Log Files CPU Usage

Back-up Strategies Making copies of data so that we can restore the original after

Back-up Strategies Making copies of data so that we can restore the original after a data loss event Disaster recovery – big problem Restore a file due to corruption or loss– smallish problem Storage – where does the data go? Media – disk, tape , redundant server(s), cloud Optimized backup procedure What’s the objective? What do you need? Roll back, Recovery time, Access & security, Performance impacts – downtime, slowness Costs

RAID Redundant array of independent disks Computer data storage schemes RAID 0, 1, 5

RAID Redundant array of independent disks Computer data storage schemes RAID 0, 1, 5 , different architectures (we use RAID 5) RAID array, multiple disks Increase data reliability, speed Mirroring – Raid 1 Parity Data across disks

Data v. Storage 1024 megabytes in a gigabyte 1000 terabytes in a petabyte Text

Data v. Storage 1024 megabytes in a gigabyte 1000 terabytes in a petabyte Text – King James Bible 5 MB 1 Hour SD TV uses 1 GB 7 minutes of HD TV uses 1 GB 114 minutes of CD audio uses 1 GB A single petabyte stored on CD-ROMs would create a stack of discs more than a mile high AT&T 16 petabytes of data transferred through their networks each day.

Aleph v. Images Aleph Production Server 1029 GB Images –SCUA, ICL 2000 GB Full

Aleph v. Images Aleph Production Server 1029 GB Images –SCUA, ICL 2000 GB Full back up weekly Images back up weekly First to Disk sent to to tape Incremental backup each night Tapes sent offsite another server, then processed to tape Incremental files , twice more during week Tapes sent offsite

Preservation Repository Security - data integrity, data access Law of Large numbers POR Proof

Preservation Repository Security - data integrity, data access Law of Large numbers POR Proof of Retrievability Checksums Replication model Multiple storage providers Multiple geographic areas

Cloud Computing is a general term for anything that involves delivering hosted services over

Cloud Computing is a general term for anything that involves delivering hosted services over the internet. A style of computing where massively scalable IT –related capabilities are provided as a service using internet technologies to multiple external customers. Gartner, 6/08 What about the Cloud?

Cloud Formations Dropbox Amazon S 3, C 2 Digital Archive OCLC Sun Cloud IBM

Cloud Formations Dropbox Amazon S 3, C 2 Digital Archive OCLC Sun Cloud IBM Computing on Demand

Clouds- Both Sides Now Fewer capital costs Bandwidth issues Utility model, pay for Control

Clouds- Both Sides Now Fewer capital costs Bandwidth issues Utility model, pay for Control issues what you use Scalable on demand SLA Data security and privacy Long-term trustworthiness Access and reliability

 Fedora Commons and DSpace Dura. Cloud – “Trust and durability in the cloud”

Fedora Commons and DSpace Dura. Cloud – “Trust and durability in the cloud” aimed at supporting libraries, universities, and other cultural heritage organizations that wish to provide perpetual access to their digital content. The service replicates and distributes content across multiple cloud providers and enables the deployment of services to support: access, preservation, re-use.

Why should we care about digital preservation? Preservation has always been one of the

Why should we care about digital preservation? Preservation has always been one of the roles of the library The formats are changing, but our mission is still the same We are expending a lot of time, effort, and money to create and collect digital content To preserve the historical and cultural record

What might we need for a true digital preservation strategy? Selection criteria Policy framework

What might we need for a true digital preservation strategy? Selection criteria Policy framework Digital preservation plan Standards and best practices for the digital objects we create Preservation metadata Digital preservation repository

Reference Model for an Open Archival Information System (OAIS )

Reference Model for an Open Archival Information System (OAIS )

Trusted Digital Repositories (TDR) The seven attributes identified in TDR include: 0. OAIS compliance

Trusted Digital Repositories (TDR) The seven attributes identified in TDR include: 0. OAIS compliance 1. Administrative responsibility 2. Organizational viability 3. Financial sustainability 4. Technological and procedural suitability 5. System security 6. Procedural accountability

Trusted Digital Repository Model (TDR)

Trusted Digital Repository Model (TDR)

Importance of planning: Important to think about preservation before you even begin a digital

Importance of planning: Important to think about preservation before you even begin a digital project "Delays in taking preservation decisions can (and most often will) result in preservation requirements that are more complex, labour intensive and therefore costly. " - Cedars Guide to Digital Collection Management