Building the Universal Library Introducing Hathi Trust Patricia
Building the Universal Library: Introducing Hathi. Trust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries December 8, 2008
The Vision Universal Digital Library Common Goal Single Entity but Partnership of Many Libraries www. hathitrust. org
The Reasons • Google Digitization Project • Collective Agreement with CIC Announced in June 2007 – U of Michigan and U of Wisconsin Projects already underway www. hathitrust. org
The Reasons • Librarians value preservation – How to ensure digital files are preserved? www. hathitrust. org
The Reasons • Librarians value access – How to create a comprehensive and coherent body of materials? • Librarians believe in cooperation – How do you achieve a common goal? www. hathitrust. org
The Beginning • In 2007, CIC agreed to establish a shared digital repository • University of Michigan and Indiana University initial leaders of this effort www. hathitrust. org
The Beginning CIC Shared Digital Repository Hathi. Trust www. hathitrust. org
The Name • The name… hathitrust. org hathi. org olifant. org silverback. org kingkong. org toomai. org www. hathitrust. org
The Name • The meaning behind the name – Hathi (hah-tee)--Hindi for elephant – Big, strong – Never forgets, wise – Secure – Trustworthy www. hathitrust. org
Banking Analogy www. hathitrust. org
The Logo www. hathitrust. org
The Partners • When announced in October 2008, full partners included: – University of California system – CIC (Committee on Institutional Cooperation) University of Chicago University of Illinois Indiana University of Iowa University of Michigan Michgian State University of Minnesota Northwestern University Ohio State University Pennsylvania State University Purdue University of Wisconsin-Madison – University of Virginia www. hathitrust. org
The Differences The Universal Bookstore vs. www. hathitrust. org
Sorting the Issues • Cost Model – Partners charged a one-time start-up fee based on the number of volumes added to the repository, in addition to an annual fee for the curation of those volumes. www. hathitrust. org
Sorting the Issues • Governance Executive Management Group Operational Advisory Board Strategic Advisory Board Hathi. Trust www. hathitrust. org
Sorting the Issues • Impact of Google settlement – Full access to materials – More quickly than a court – Win would have permitted content locked up for years www. hathitrust. org
Hathi. Trust Architecture • Storage in Ann Arbor and Indianapolis • Encrypted backup to 2 nd AA location • Inbound validation, standards-based object storage and related metadata • Rights database for rights metadata • Online catalog as source and storage for descriptive metadata www. hathitrust. org
Page image and metadata repository • Objectives: – A guiding principle: store archival images, create deliverables on demand – Incorporate TDR-specific practices • Simple filesystem layout using Pairtree structure – One directory per volume, all files inside zip w/associated METS file – Use of a namespace allows for conflicting identifiers – Namespaces for institutions and, if needed, types of identifiers within the institution www. hathitrust. org
© Rights database, pt 1 • What information to store? – Considered complexity and maintenance – Considered using MARC directly – Needed to accommodate both bib record-derived rights and manual overrides • Approach: examine bib record, determine authoritative copyright status, store rights attribute, source, reason, and timestamp • Stored in My. SQL www. hathitrust. org
© Rights database, pt. 2 • Each rights attribute must have a reason. – bib: bibliographically-derived – man: manual access control override – ddd: due diligence documented • Typical rights attributes in use – – pd: public domain pdus: public domain for US viewers* inc: in copyright nobody (override): no access • Source (e. g. , ‘google’) www. hathitrust. org
Pageturner: page image retrieval XML © Geo. IP rights database XSLT archival page image library catalog metadata METS XML HTML online page image www. hathitrust. org browser
Hathi. Trust and TRAC • Automatic validation in GROOVE – Check barcode check digit using Luhn algorithm – Fixity check on JPG, TIFF, UTF 8 using MD 5 – Well-formedness and embedded metadata check on JPG, TIFF, UTF 8 using JHove – Various completeness cross-checks – Failures retried, admin will eventually intervene • Periodic fixity checks using MD 5 www. hathitrust. org
OAIS Reference Model GROOVE (JHOVE) MARC record extensions (Aleph) Rights DB Page Turner Hathi. Trust API OAI Geo. IP DB CNRI Handles [Solr] Google [OCA] In-house Conversion GRIN Internal Data Loading METS/PREMIS object TIFF G 4/JPEG 2000 OCR MD 5 checksums Isilon Site Replication TSM MD 5 checksum validation www. hathitrust. org METS object PNG OCR PDF
METS Object • Why METS? – Can serve as an Archival Information Package and a Dissemination Information Package – Designed to record the relationship between pieces of complex digital objects – Can be created automatically as texts are loaded or reloaded www. hathitrust. org
METS Object • What’s there? – mets. Hdr with an ID and CREATEDATE – dmd. Sec with a URL – Two tech. MD referencing notes files – Two file. Grps (images and OCR) – Physical struct. Map tying together the files with any metadata (pg. numbers or features) www. hathitrust. org
Hathi. Trust Services • Preservation of digital surrogate • Access (within bounds of law and settlement) – Viewing – Redistribution • Services for print-disabled users • Section 108 • Non-consumptive research www. hathitrust. org
Hathi. Trust Branding www. hathitrust. org
Legal Status of the Books • Outside of the Settlement – Public domain content digitized by libraries unconstrained – Libraries continue to do preservation-related work with in-copyright works (Sec 108) • Settlement – – – LDC or cooperative LDC (Hathi. Trust) Services for print-disabled users Non-consumptive research Section 108 uses General discovery Sharing of Public domain www. hathitrust. org
Hathi. Trust Future • • Expansion of partnership New services Revision of governance Refinement of content www. hathitrust. org
Contacts, etc. • http: //www. Hathi. Trust. org (see sitemap) • Patricia Steele <steele@indiana. edu> • John Wilkin <jpwilkin@umich. edu> www. hathitrust. org
Digital library for the future www. hathitrust. org
- Slides: 31