HATHI TRUST A Shared Digital Repository Digital Repositories

  • Slides: 28
Download presentation
HATHI TRUST A Shared Digital Repository Digital Repositories for Preservation and Access Digital Directions

HATHI TRUST A Shared Digital Repository Digital Repositories for Preservation and Access Digital Directions 2013 Jeremy York July 22, 2013 Unless otherwise noted, these slides and their contents are licensed under a Creative Commons Attribution Unported License.

Digital repositories • Primary mission to preserve content • Performs actions to this end

Digital repositories • Primary mission to preserve content • Performs actions to this end

Reasons to preserve content • For access • Guard against threats to content –

Reasons to preserve content • For access • Guard against threats to content – Digitization accepted method of preservation reformatting – Digital deteriorates, is fragile

Reasons to provide access • Meet needs of designated community • Check on integrity

Reasons to provide access • Meet needs of designated community • Check on integrity of content • Content that is accessible is more likely to be valued and preserved in the future

Reasons access might not be offered • • Copyright Privacy Licensing Needs of user

Reasons access might not be offered • • Copyright Privacy Licensing Needs of user community – Content available elsewhere • Technical limitations – Networking and storage requirements

A number of models • Full user access to preserved digital objects • No

A number of models • Full user access to preserved digital objects • No end-user access to digital objects • Delayed or triggered user access to digital objects • Partial access to digital objects

Requirements to preserve content • OAIS – “An OAIS is an Archive, consisting of

Requirements to preserve content • OAIS – “An OAIS is an Archive, consisting of an organization. . . of people and systems that has accepted the responsibility to preserve information and make it available for a Designated Community. ” [does not imply unrestricted access]

OAIS • Support information model – Define target of preservation (content data and representation

OAIS • Support information model – Define target of preservation (content data and representation information) – Define metadata needed to preserve, identify, contextualize information (PDI) • Fulfill responsibilities – – – Accept information from Producers Obtain control sufficient to preserve Ensure understandable to designated community Ensure preservation Make available to designated community with information supporting authenticity

Ensure preservation • Some strategies: – Transformation – Validation – Checks on integrity –

Ensure preservation • Some strategies: – Transformation – Validation – Checks on integrity – Replication – Choice of formats – Migration

TRAC • Starts with “a mission to provide reliable, longterm access to managed digital

TRAC • Starts with “a mission to provide reliable, longterm access to managed digital resources to its designated community, now and into the future” • Encompasses – Organizational Infrastructure – Digital Object Management – Technical Infrastructure

TRAC (2) • Borrows vocabulary from OAIS • Adapts ideas for applying criteria from

TRAC (2) • Borrows vocabulary from OAIS • Adapts ideas for applying criteria from nestor and Digital Curation Centre – Documentation (evidence) – Transparency – Adequacy – Measurability

Mission OAIS TRAC Provenance Reference Context Fixity Access Rights Content Data Representation Information Preservation

Mission OAIS TRAC Provenance Reference Context Fixity Access Rights Content Data Representation Information Preservation Actions Integrity Authenticity Transparency Documentation Organizational Infrastructure Reliability Adequacy Digital Object Management Measurability Technical Infrastructure Designated Community Preserve Content

Where does access come in • Some level of access is necessary – Management,

Where does access come in • Some level of access is necessary – Management, integrity • What is preserved may not be what is most useful to the end user • Implications across the repository

Content formats • Can the content you are preserving be delivered over the Web?

Content formats • Can the content you are preserving be delivered over the Web? – Will you be storing derivative files? – Is some kind of transformation needed? – Do the files offer consistent functionality? • Implications for scale of repository, access systems, changes to services • In Hathi. Trust: – Limited to 3 formats, largely uniform in technical characteristics • ITU G 4 TIFF • JPEG 2000 • Unicode (with and without coordinates)

Storage of information about content • Is information about object adequately available for both

Storage of information about content • Is information about object adequately available for both preservation and access? – Structural information – Preservation information with implications for interface • Hathi. Trust uses METS as a wrapper – Available for preservation and access

Content Package images text Source METS Zip HT METS

Content Package images text Source METS Zip HT METS

Architecture. . /uc 1/pairtree_root/b 3/54/34/86/b 34543486. zip b 34543486. mets. xml images HT METS

Architecture. . /uc 1/pairtree_root/b 3/54/34/86/b 34543486. zip b 34543486. mets. xml images HT METS text Source METS

Storage • Does the storage system support needs for ingest and access? • In

Storage • Does the storage system support needs for ingest and access? • In Hathi. Trust: – Need to have fast access to repository systems to support services

Security • Data Integrity – Checksum validation, digital object provenance • Physical security –

Security • Data Integrity – Checksum validation, digital object provenance • Physical security – Biometric door systems, locked racks • Network security – Firewalling, vulnerability scanning • Application security – Developer best practices, input validation • Access control…

Differential access to content • Rights database – Ensures appropriate access • Holdings database

Differential access to content • Rights database – Ensures appropriate access • Holdings database – Facilitates lawful uses of materials

Authentication/Authorization • Mechanisms to enable differential access, ensure security and appropriate use

Authentication/Authorization • Mechanisms to enable differential access, ensure security and appropriate use

User services • Bibliographic and full-text search indexes • Collection-building capabilities • User interfaces

User services • Bibliographic and full-text search indexes • Collection-building capabilities • User interfaces

APIs and Datasets • • • Data API Bibliographic API OAI “Hathifiles” Datasets

APIs and Datasets • • • Data API Bibliographic API OAI “Hathifiles” Datasets

More • Quality • User Support • Correction

More • Quality • User Support • Correction

Content Formats Content Package Architecture Storage Security Authentication Authorization Differential Access Copyright/Agreem ents Lawful

Content Formats Content Package Architecture Storage Security Authentication Authorization Differential Access Copyright/Agreem ents Lawful Uses Indexes Services / User Interfaces APIs and Datasets Information Quality User Support Correction Provide Access

Mission OAIS Preservation TRAC Provenance Reference Context Fixity Access Rights Content Data Representation Information

Mission OAIS Preservation TRAC Provenance Reference Context Fixity Access Rights Content Data Representation Information Preservation Actions Authenticity Documentation Organizational Infrastructure Integrity Transparency Reliability Adequacy Digital Object Management Measurability Technical Infrastructure Designated Community Content Formats Content Package Architecture Security Authentication Authorization Lawful Uses Indexes Information Quality User Support Copyright/Agre ements APIs and Datasets Storage Differential Access Services / User Interfaces Correction Access

Thank you!

Thank you!

How to find out more • • About: http: //www. hathitrust. org/about Twitter: http:

How to find out more • • About: http: //www. hathitrust. org/about Twitter: http: //twitter. com/hathitrust Facebook: http: //www. facebook. com/hathitrust Monthly newsletter: – http: www. hathitrust. org/updates – RSS http: //www. hathitrust. org/updates_rss • Contact us: feedback@issues. hathitrust. org • Blogs: http: //www. hathitrust. org/blogs – Large-scale Search – Perspectives from Hathi. Trust