Planning to Maximize Longevity of Digital Information Howard

  • Slides: 37
Download presentation
Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education &

Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information http: //www. gseis. ucla. edu/~howard Besser--Digital Longevity 9/2/00 (12/12/99) 1

Planning to Maximize Longevity of Digital Info The Ecology Metaphor Why are you Managing

Planning to Maximize Longevity of Digital Info The Ecology Metaphor Why are you Managing this Information? Major Issues Facing Digital Projects The Short Life of Digital Info Important Planning Considerations Key Considerations for Imaging Projects Besser--Digital Longevity 9/2/00 (12/12/99) 2

The Ecology Metaphor Besser--Digital Longevity 9/2/00 (12/12/99) 3

The Ecology Metaphor Besser--Digital Longevity 9/2/00 (12/12/99) 3

Why are you Managing this Information? Organizational mission & type Users Uses Besser--Digital Longevity

Why are you Managing this Information? Organizational mission & type Users Uses Besser--Digital Longevity 9/2/00 (12/12/99) 4

Major Issues Facing Digital Projects Dangerous Changes in Intellectual Property Law Intellectual Access Storage

Major Issues Facing Digital Projects Dangerous Changes in Intellectual Property Law Intellectual Access Storage Delivery Integration with other tools Interoperability Besser--Digital Longevity 9/2/00 (12/12/99) 5

Serious Longevity Problems _ _ What we know from prior widespread digital file formats

Serious Longevity Problems _ _ What we know from prior widespread digital file formats Images separating from their metadata Inaccessibility of software needed to view a work Inability to even decode the file format of a work Besser--Digital Longevity 9/2/00 (12/12/99) 6

The Short Life of Digital Info: Digital Longevity Problems Disappearing Information The Viewing Problem

The Short Life of Digital Info: Digital Longevity Problems Disappearing Information The Viewing Problem The Scrambling Problem The Inter-relation Problem The Custodial Problem The Translation Problem Besser--Digital Longevity 9/2/00 (12/12/99) 7

The Viewing Problem Digital Info requires a whole infrastructure to view it Each piece

The Viewing Problem Digital Info requires a whole infrastructure to view it Each piece of that infrastructure is changing at an incredibly rapid rate How can we ever hope to deal with all the permutations and combinations Besser--Digital Longevity 9/2/00 (12/12/99) 8

The Scrambling Problem Dangers from: Compression to ease storage & delivery Container Architecture to

The Scrambling Problem Dangers from: Compression to ease storage & delivery Container Architecture to enhance digital commerce Besser--Digital Longevity 9/2/00 (12/12/99) 9

The Inter-relation Problem -Info is increasingly inter-related to other info -How do we make

The Inter-relation Problem -Info is increasingly inter-related to other info -How do we make our own Info persist when it points to and integrates with Info owned by others? -What is the boundary of a set of information (or even of a digital object)? Besser--Digital Longevity 9/2/00 (12/12/99) 10

The Custodial Problem In the past, much of survival was due to redundancy How

The Custodial Problem In the past, much of survival was due to redundancy How do we decide what to save? Who should save it? Mellon-funded E-Journal Archives How should they save it? - Besser--Digital Longevity 9/2/00 (12/12/99) 11

The Custodial Problem: How to save information? Methods for later access Refreshing Migration Emulation

The Custodial Problem: How to save information? Methods for later access Refreshing Migration Emulation Issues of authenticity and evidence Besser--Digital Longevity 9/2/00 (12/12/99) 12

The Translation Problem Content translated into new delivery devices changes meaning – – –

The Translation Problem Content translated into new delivery devices changes meaning – – – -A photo vs. a painting -If Info is produced originally in digital form in one encoded format, will it be the same when translated into another format? Behaviors Besser--Digital Longevity 9/2/00 (12/12/99) 13

Pieces of the Solution (1/2) -We need to insist upon clearly readable standardized ways

Pieces of the Solution (1/2) -We need to insist upon clearly readable standardized ways for digital objects to selfidentify their formats -We should discourage scrambling -We need to better understand information inter-relates to other Info, and what constitutes “boundaries” of Info objects Besser--Digital Longevity 9/2/00 (12/12/99) 14

Pieces of the Solution (2/2) -People and organizations wishing to make information persist need

Pieces of the Solution (2/2) -People and organizations wishing to make information persist need guidelines of how to go about doing it -We need to better understand how translating from one storage or display format to another affects the meaning of a work -We need to save the “behaviors” of a digital object, not just its “contents” Besser--Digital Longevity 9/2/00 (12/12/99) 15

Conceptual Approaches to Digital Preservation _ Refreshing always necessary due to volatility of physical

Conceptual Approaches to Digital Preservation _ Refreshing always necessary due to volatility of physical strata – Impact on evidential value _ _ Migration -- advantages & disadvantages Emulation -- advantages & disadvantages Besser--Digital Longevity 9/2/00 (12/12/99) 16

To deal with Immediately_ _ Persistent IDs Metadata Besser--Digital Longevity 9/2/00 (12/12/99) 17

To deal with Immediately_ _ Persistent IDs Metadata Besser--Digital Longevity 9/2/00 (12/12/99) 17

Persistent IDs--the Problem _ _ _ Need to separate work ID from work location

Persistent IDs--the Problem _ _ _ Need to separate work ID from work location URNs probably won’t be ready until 2003 Becomes a business process issue when one organization maintains the resource and another organization references it (ie. licensed from vendors or managed by separate administrative structures) Besser--Digital Longevity 9/2/00 (12/12/99) 18

More Persistent IDs --the Approach for today _ _ PURLs Handles HTTP redirects And

More Persistent IDs --the Approach for today _ _ PURLs Handles HTTP redirects And worry about costs now and conversion costs when URNs become feasible Besser--Digital Longevity 9/2/00 (12/12/99) 19

Data Set Management More issues with referencing IDs _ _ _ References for mirror

Data Set Management More issues with referencing IDs _ _ _ References for mirror sites References for back-up sites when main site is down or bottle-necked References for off-site copies and archival copies Besser--Digital Longevity 9/2/00 (12/12/99) 20

Metadata can be the first line of defense Can – – – tell you

Metadata can be the first line of defense Can – – – tell you where the file is (if you can’t find the file) where more info about the file is (if you have the file but most other metadata has become separated) what the file format is what the compression scheme is what application program and version is needed for the file Besser--Digital Longevity 9/2/00 (12/12/99) 21

Structural Metadata Issues http: //sunsite. berkeley. edu/moa 2 Besser--Digital Longevity 9/2/00 (12/12/99) 22

Structural Metadata Issues http: //sunsite. berkeley. edu/moa 2 Besser--Digital Longevity 9/2/00 (12/12/99) 22

Architecture: Separating Longevity and Delivery Servers User Berkeley Longevity Server User Berkeley Delivery Server

Architecture: Separating Longevity and Delivery Servers User Berkeley Longevity Server User Berkeley Delivery Server User Other Delivery Server Besser--Digital Longevity 9/2/00 Other Delivery Server (12/12/99) Other Delivery Server 23

Groups Working on the Big Problem http: //sunsite. Berkeley. EDU/Longevity/ CPA Task Force Getty

Groups Working on the Big Problem http: //sunsite. Berkeley. EDU/Longevity/ CPA Task Force Getty “Time & Bits” Conference & Follow-ups Emulation experiments in US and Europe NEDLIB, CURL, Michigan Mellon-funded E-Journal Archive experiments Internet Archive Long Now Besser--Digital Longevity 9/2/00 (12/12/99) 24

Time & Bits Besser--Digital Longevity 9/2/00 (12/12/99) 25

Time & Bits Besser--Digital Longevity 9/2/00 (12/12/99) 25

Time & Bits Participants Steward Brand Howard Besser Brian Eno Danny Hillis Peter Lyman

Time & Bits Participants Steward Brand Howard Besser Brian Eno Danny Hillis Peter Lyman Brewster Kahle Kevin Kelly Besser--Digital Longevity 9/2/00 Jaron Lanier Doug Carlston John Heilemann Ben Davis Margaret Mac. Lean Bruce Sterling Paul Saffo (12/12/99) 26

Groups Working on Pieces of the Big Problem http: //sunsite. berkeley. edu/Longevity/ Internet Archive

Groups Working on Pieces of the Big Problem http: //sunsite. berkeley. edu/Longevity/ Internet Archive Long Now Emulation experiments in US and Europe NEDLIB, CURL, Michigan Besser--Digital Longevity 9/2/00 (12/12/99) 27

Journal Archiving _ _ License, don’t own; may not be even able to obtain

Journal Archiving _ _ License, don’t own; may not be even able to obtain right to make archival copy Increasingly no paper back-up at all Usually we don’t have the important redundancy factor Stanford’s LOCKSS Project (Lots of Copies Keeps Stuff Safe) and its problems (http: //lockss. stanford. edu) Besser--Digital Longevity 9/2/00 (12/12/99) 28

Complexity of Rich Media _ _ Works often have artistic nature (including video games)

Complexity of Rich Media _ _ Works often have artistic nature (including video games) Enormous number of elements can, at times, be very important to preserve (pacing, original artifact, elements used to construct the artifact) Too complex to save every one of these aspects for every type of material Importance of saving documentation Besser--Digital Longevity 9/2/00 (12/12/99) 29

Important Planning Considerations File Formats Choosing Interoperable Systems Adhere to standards Vendors with large

Important Planning Considerations File Formats Choosing Interoperable Systems Adhere to standards Vendors with large installed base Refreshing and/or Migration Besser--Digital Longevity 9/2/00 (12/12/99) 30

Key Considerations for Imaging Projects Users' Needs Image Quality Intellectual Property Standards Topology Tools

Key Considerations for Imaging Projects Users' Needs Image Quality Intellectual Property Standards Topology Tools & Processes Besser--Digital Longevity 9/2/00 (12/12/99) 31

Key Considerations for Imaging Projects (1 of 3) Users' – – Quality of Digital

Key Considerations for Imaging Projects (1 of 3) Users' – – Quality of Digital Surrogate Interoperable desktop applications Image – – Needs Quality Archival Current online delivery Besser--Digital Longevity 9/2/00 (12/12/99) 32

Key Considerations for Imaging Projects (2 of 3) Intellectual Property Standards – – –

Key Considerations for Imaging Projects (2 of 3) Intellectual Property Standards – – – Modular and Layered Architecture Terminology Technical imaging information Topology Besser--Digital Longevity 9/2/00 (12/12/99) 33

Key Considerations for Imaging Projects (3 of 3) Tools – – – & Processes

Key Considerations for Imaging Projects (3 of 3) Tools – – – & Processes Scanners Compression techniques Linking files Workflow Interoperable desktop applications Besser--Digital Longevity 9/2/00 (12/12/99) 34

Some nuts-and-bolts Planning Considerations Think about users (and potential users), uses, and type of

Some nuts-and-bolts Planning Considerations Think about users (and potential users), uses, and type of material/collection Scan at the highest quality that does not exceed the likely potential users/uses/material Do not let today’s delivery limitations influence your scanning file sizes; understand the difference between digital masters and derivative files used for delivery Many documents which appear to be bitonal actually are better represented with greyscale scans Besser--Digital Longevity 9/2/00 (12/12/99) Include color bar and ruler in the scan Use objective measurements to determine scanner settings (do NOT attempt to make the image good on your particular monitor or use image processing to color correct) Don’t use lossy compression Store in a common (standardized) file format Capture as much metadata as is reasonably possiple (including metadata about the scanning process itself) 35

One Final Question: Who will collect the digital works of today that should become

One Final Question: Who will collect the digital works of today that should become the Special Collections of tomorrow? _ _ _ web sites zines electronic journals listserve and email discussions drafts of works that later become famous Besser--Digital Longevity 9/2/00 (12/12/99) 36

Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education &

Planning to Maximize Longevity of Digital Information Howard Besser UCLA School of Education & Information http: //sunsite. berkeley. edu/Longevity/ http: //www. gseis. ucla. edu/~howard http: //sunsite. berkeley. edu/moa 2 http: //lockss. stanford. edu http: //www. longnow. com/10 klibrary/Time. Bits. Disc/ http: //www. archive. org/ Besser--Digital Longevity 9/2/00 (12/12/99) 37