Hathi Trust A Big Idea with Bold Plans
Hathi. Trust: A Big Idea with Bold Plans Brenda Johnson, Dean of University Libraries Gary Charbonneau, Systems Librarian Julie Bobay, Associate Dean for Collection Development and Scholarly Communication Statewide IT Conference, Indiana University Sept. 27, 2010
Statewide IT Conference, Indiana University September 27, 2010 Hathi. Trust - Outline A Big Idea • Mission and Goals; Partners; Governance Content and Use • Relationship to Google Books and Internet Archive • Size, characteristics of content • A few words about technology Bold Plans
Statewide IT Conference, Indiana University Importance of A Name • Hathi (pronounced hah-tee) Hindi word for elephant, an animal highly regarded for its memory, wisdom, and strength • Trust A core value of research libraries and one of their greatest assets. In combination, the words convey the key benefits researchers can expect from a first-of-its-kind shared digital repository • There’s an elephant in the library. September 27, 2010
Statewide IT Conference, Indiana University September 27, 2010 What is Hathi. Trust? • Started in 2008 as a partnership among research libraries, Hathi. Trust is an open web resource that aggregates, preserves and provides access to the collections of member libraries. • Initial purpose was to provide trusted shared repository for books and journals digitized by and available through Google Books and Internet Archive
Statewide IT Conference, Indiana University September 27, 2010 Google Books/Internet Archive • In 2004, Google began digitizing the books and journals from many major research libraries in U. S. – including, starting in 2008, IU’s • Some libraries, including the University of California, had similar digitization projects with the Internet Archive • Books and journals digitized from these projects were deposited in Hathi. Trust
Statewide IT Conference, Indiana University September 27, 2010 Current Hathi. Trust Partners: 29 and Counting Columbia University Dartmouth University of California system (11 libraries) CIC (Committee on Institutional Cooperation) (12 libraries) University of Chicago University of Illinois Indiana University of Iowa University of Michigan State University New York Public Library Princeton University of Virginia Yale University of Minnesota Northwestern University Ohio State University Pennsylvania State University Purdue University of Wisconsin, Madison
Statewide IT Conference, Indiana University September 27, 2010 If Google and Internet Archive have these books, why do we need Hathi. Trust? Hathi. Trust’s mission is much broader than simply to replicate Google Books: Contribute to the common good by collecting, organizing, preserving, communicating, and sharing the record of human knowledge.
Statewide IT Conference, Indiana University September 27, 2010 Why do we need Hathi. Trust? (1) Preservation…For The Long Term • Better entrusted to research libraries than to a private corporation, even a benevolent one • Not just preserving bits • Full preservation program, including active curation, metadata, migration, management plans, etc. • Seeking TRAC Certification (Trustworthy Repository Audit and Certification)
Statewide IT Conference, Indiana University September 27, 2010 Why do we need Hathi. Trust? (2) Expanded access and discoverability • Full-text access to pre-1923 books and journals, plus those which have had rights cleared • Beyond full-text keyword search: enhanced discoverability options
Statewide IT Conference, Indiana University September 27, 2010 Why do we need Hathi. Trust? (3) Focus on scholarly values and needs • Develop content, access and functionality that meets needs of researchers • Share expertise and cost of preserving and providing access to scholarly record among institutions who share this fundamental mission
Statewide IT Conference, Indiana University September 27, 2010 Hathi. Trust: Getting Started • Initial development responsibility: University of Michigan, with mirror site at IUPUI, administered by UITS Enterprise Infrastructure • Much future development will be distributed among partner institutions under direction of Hathi. Trust Executive Committee
Statewide IT Conference, Indiana University September 27, 2010 A Unique Partnership • Hathi. Trust is library work at scale; an early example of an “above-campus” service • A new experiment in collaboration Not a separate entity; not a 501(c)(3) like Sakai, Kuali, Dura. Space or many open source software projects Instead, a jointly-funded, jointly governed, jointly developed partnership. • Together, we are Hathi. Trust.
Statewide IT Conference, Indiana University September 27, 2010 Sustainability: Hathi. Trust Governance 2008 -2012 • Executive Committee Budget, finances, decision making • Strategic Advisory Board Guidance on policy and planning • Hathi. Trust staff • Working groups and committees
Statewide IT Conference, Indiana University September 27, 2010 Current Working Groups • • Discovery Interface Collections Quality Communication Usability Storage Development Environment Research Center
Enterprise Management Governance Communication and Coordination with partner institutions Budget, Finances Decision-making Project management Policy Planning Repository Administratio n Hardware configuration and maintenance Web and application server configuration and maintenance Security Repository Administratio n Data management (content storage, backup, integrity checks, deletion) Hardware selection and replacement Content and Metadata specifications Permissions Rights Management Bibliographic Data Management Copyright determination Entity description (record-level) Copyright review Object identification (item -level) Copyright information management (database) Data availability Collection Development Digital • Expansion beyond books and journals (born-digital, images and maps, audio) • Selection of content (for non. Google volume ingest and pilots projects) Print • Cloud Library (effect of digital on print) Rightsholder permissions Disaster Recovery Logging Processes for ensuring content integrity e-Commerce Content Ingest Print on Demand Financial contributions of partners Content Access Quality Assurance User Services Transformation Page. Turner Quality Review Usability Validation Collection Builder Content Certification User support (helpdesk) Large-scale Search Research Center Bibliographic Catalog APIs Hathi. Trust Functional Framework Outreach Project website Monthly newsletter Papers and presentations Communication with potential partners Surveys, general inquiries Repository evaluation and audit (e. g. , DRAMBORA, TRAC) Legal Risk management (use of materials) Partner agreements Advocacy
Statewide IT Conference, Indiana University September 27, 2010 Next steps in governance • 5 -year agreements, reviewed in the third year of every term • First Constitutional Convention will be in 2012 • Partners will determine governance structures and partnership models, effective 2013
September 27, 2010 Statewide IT Conference, Indiana University Focus On Users • Preservation…with access • Benefits to IU researchers and their colleagues around the world: – Ensure long-term preservation and access – Increase discoverability – Create scholarly tools – Expand content beyond Google and Internet Archive
Statewide IT Conference, Indiana University September 27, 2010 Hathi. Trust – constantly changing • Rapid growth and development; fluid environment • Next few slides describe Hathi. Trust currently • Will follow with discussion about future plans
Statewide IT Conference, Indiana University September 27, 2010 Hathi. Trust - Content • The vast majority of what is currently in Hathi. Trust consists of files received from Google from volumes digitized by Google for Google Book Search • Almost all of the remainder consists of files received from Internet Archive. Much of the content from University of California comes by way of Internet Archive
Statewide IT Conference, Indiana University September 27, 2010 Hathi. Trust Content (2) • Since not all of Google’s “library partners” are members of Hathi. Trust, and none of Google’s publisher partners are, Hathi. Trust is still (mostly) a subset of what is in Google Book Search. However….
Statewide IT Conference, Indiana University September 27, 2010 Hathi. Trust Content (3) • Because of Hathi. Trust’s copyright clearance project, there are some things available in full text in Hathi. Trust that are only available in “snippet view” in Google. • Because of Internet Archive, there are probably some things in Hathi. Trust that are not available in Google at all.
Statewide IT Conference, Indiana University September 27, 2010 Hathi. Trust - focus on collections • Hathi. Trust is about collections, not simply Google digitization • For example: • access for persons with print disabilities • opening access for public domain volumes • collection building tool • high-quality bibliographic data necessary for scholarly work
Statewide IT Conference, Indiana University September 27, 2010 Content Growth
Statewide IT Conference, Indiana University Content Distribution September 27, 2010
Statewide IT Conference, Indiana University September 27, 2010 Language Distribution (1)
Statewide IT Conference, Indiana University September 27, 2010 Language Distribution (2)
September 27, 2010 Statewide IT Conference, Indiana University Dates
Statewide IT Conference, Indiana University Originating Institution September 27, 2010
Statewide IT Conference, Indiana University Content Over Time September 27, 2010
Statewide IT Conference, Indiana University September 27, 2010
Statewide IT Conference, Indiana University September 27, 2010
Statewide IT Conference, Indiana University September 27, 2010
Statewide IT Conference, Indiana University September 27, 2010
Statewide IT Conference, Indiana University September 27, 2010 Hathi. Trust Data. Grid • Using Isilon Clustered Storage System • Similar principles to a datagrid using WAFS (One. FS) – Wide Area File System (2. 3 PB per file system) – Automated data replication among nodes – Currently Two Nodes • Ann Arbor - University of Michigan • Indianapolis – Indiana University NOC • Connected via I-Light and Michigan Lambda Rail
Statewide IT Conference, Indiana University September 27, 2010 Hathi. Trust Grid Indianapolis Ann Arbor Isilon One. FS Currently Supports up to 2. 3 PB between Two Nodes
Statewide IT Conference, Indiana University September 27, 2010 More on Hathi. Trust Technology http: //www. hathitrust. org/technology
Statewide IT Conference, Indiana University September 27, 2010 A Use Case • IUB scholar needed quick access to a definitive 52 -volume set of Voltaire’s work published in late 1800 s; deadline approaching • Had been transferred to the Auxiliary Library Facility • Available in Hathi. Trust and Google Books • Google Books not usable for this scholarly purpose • Able to do work much more efficiently and quickly in Hathi. Trust
Statewide IT Conference, Indiana University September 27, 2010 Hathi. Trust’s Bold Plans • We believe the Hathi. Trust of tomorrow will look very different from the Hathi. Trust of today • Google and Internet Archive digitized volumes just the beginning • The sky’s the limit (or, more accurately, the combined will and resources of the partnership are the limit)
Statewide IT Conference, Indiana University September 27, 2010 Vision for the future: More Content • Current and backlist scholarly monographs • Born-digital materials • Some locally-digitized collections • Some non-book/non-journal resources …anything that is appropriate for a research library collection AND IS A SHARED PRIORITY FOR PARTNERS
Statewide IT Conference, Indiana University September 27, 2010 Vision for the future: More Content (2) • More full-text: Google Book Settlement - if approved: – could receive all Google-digitized files to preserve – could make much more full-text available • Rights-clearing project - open access to public domain materials
Statewide IT Conference, Indiana University September 27, 2010 Vision for the Future: More Functionality • Research tools – Computational research – Advanced collection builders – Advanced discovery • Expanded quality processes • Rigorous preservation guarantees • Defining paths for fair uses • Tools for shared print collection management
Statewide IT Conference, Indiana University September 27, 2010 Vision for the Future: Enhanced Discoverability • Not just keyword searching of full-text • Highly-functional bibliographic access - Hathi. Trust catalog - Integration into other discovery tools: - IUCAT, World. Cat, Discovery Services
Statewide IT Conference, Indiana University September 27, 2010 Hathi. Trust and local digital library initiatives • Hathi. Trust is a solution for large-scale, shared high-priority needs of partners; currently optimized for digitized monographs and journals • Partners will identify priorities for content and functionality development • Hathi. Trust will not supplant all institutionallybased digital library initiatives • Local digital library collections and services will still be needed
Statewide IT Conference, Indiana University September 27, 2010 How Can Hathi. Trust Make a Difference? • Future not yet known precisely, but… • For the first time in history, Hathi. Trust has: - defined a large-scale partnership to achieve a largescale goal - built the first version of a very large, high-quality shared repository • Building blocks to ensuring that research collections, print and digital: • are preserved, curated, highly discoverable and accessible • retain their research value in a digital platform
Statewide IT Conference, Indiana University September 27, 2010 Some lessons learned so far • Hathi. Trust can serve as shared repository for mass digitized library collections • Hathi. Trust can provide organizational structure for other collaborations – Shared print collection management – Bibliographic integration • The research library community is able to collaborate deeply to attain shared goals
Statewide IT Conference, Indiana University September 27, 2010 Hathi. Trust Mission - redux Contribute to the common good by collecting, organizing, preserving, communicating, and sharing the record of human knowledge.
September 27, 2010 Statewide IT Conference, Indiana University Credits Our thanks to colleagues who generously granted us permission to use their slides for this presentation: John Wilkin, Hathi. Trust Executive Director Jeremy York, Hathi. Trust Project Librarian Heather Christenson, Mass Digitization Project Manager, California Digital Library Also, many of the ideas for this presentation based on: Courant, Paul N. and John Wilkin. “Building ‘Above Campus’ Library Services. ” Educause Review, July/August 2010, 74 -75.
- Slides: 47