Building Digital Libraries Howard Besser NYU Archiving and
Building Digital Libraries Howard Besser NYU Archiving and Preservation Program and Library Senior Scientist http: //www. gseis. ucla. edu/~howard Bernie Hurley, Chief Scientist UC Berkeley Libraries Besser & Hurley 10/10/02 1
Building Digital Libraries Models for Digital Repositories (Besser) Importance of Metadata Standards & Philosophies Introduction (Besser) Discovery Metadata: The Dublin Core (Besser) Digital Object Standards (METS) (Hurley) Content Format Standards (Images) (Besser) A Conceptual Digital Library Model for the New Information Environment Introduction (Hurley) Content Management (Hurley) Longevity & Preservation Repositories (Besser) Access Systems (Hurley) Other Elements (Besser) Actors Metadata Preserving Electronic Art. . . Besser & Hurley 10/10/02 2
Models for Digital Repositories Besser & Hurley 10/10/02 3
From Digital Collections to Digital Libraries, Museums, and Archives • No longer merely experiments • Adhere to our fields’ traditions (access, interoperability, sustainable, privacy, …) • Provide services Besser & Hurley 10/10/02 4
To respond to our needs for both Service & Traditions, we face the challenges of: Access (discovery) Sustainability (longevity) Interoperability- Besser & Hurley 10/10/02 5
Serious Longevity Problems What we know from prior widespread digital file formats Images separating from their metadata Inaccessibility of software needed to view an image Inability to even decode the file format of an image Besser & Hurley 10/10/02 6
Traditional Digital Repository Model DL search & presentation DL DL search & presentation user
Ideal Digital Repository Model DL DL search & presentation user
Importance of Metadata Standards & Philosophies Besser & Hurley 10/10/02 9
For Interoperability, Repositories Need Standards (as well as Sustainability & Access) Descriptive Metadata for consistent description Discovery Metadata for finding Administrative Metadata for viewing and maintaining Structural Metadata for navigation . . . Terms & Conditions Metadata for controlling access. . . Besser & Hurley 10/10/02 10
Why are Standards and Metadata consensus important? Managing digital files over time Longevity Interoperability Veracity Recording in a consistent manner Will give vendors incentive to create applications that support this Besser & Hurley 10/10/02 11
Philosophical Metadata Decisions • Warwick vs MARC • Where to put the metadata Besser & Hurley 10/10/02 12
Containers and Packages of Metadata Warwick, not MARC • modular • overlapping • extensible • community-based • designed for a networked world to aid commonality btwn communities while still providing full functionality within each community Besser & Hurley 10/10/02 13
Some different schemes where Metdata is kept • embedded within the object (TIFF headers) • encapsulated with image (MOA 2/METS) • in a separate related DB maintained by same organization (OPAC) • in a separate DB maintained by a separate organization (Books in Print, ratings systems) Besser & Hurley 10/10/02 14
Discovery Metadata • Dublin Core - NISO Z 39. 85 (3/95) • CBIR (ongoing) Besser & Hurley 10/10/02 15
Dublin Core--further work • Warwick Framework – – metadata packages for extensible functions layed groundwork for RDF • Canberra Qualifiers – – refining the semantics of the element set to provide more precise info SUBELEMENT, SCHEME, LANG • Granularity – no hierarchical relationships w/i a given DC record; only one record per discrete object (collection or item-level), and relationship field plus qualifier links them Besser & Hurley 10/10/02 16
The Research Process and Functional Categories of Metadata • Discovery • Retrieval • Collation • Analysis • Re-presentation
Open Archives & metadata harvesting Besser & Hurley 10/10/02 18
Standardized Digital Objects METS Metadata Encoding & Transfer Syntax Besser & Hurley 10/10/02 19
What is a “Digital Object? ” • Combined Digital Content & Metadata – Digital Content • Digitized materials -- photographs, page images from a book, maps, digitized audio or video… • Born Digital – GIS maps, digitally captured audio or video, numeric datasets (census files, scientific dataset), Web sites… – Metadata • • Descriptive Administrative Structural Behavior Besser & Hurley 10/10/02 20
What is METS? • An XML Schema that is used to Encode all the Content and Metadata for a Digital Object – The relationships between content and metadata are also captured • A METS Object is often call a METS Document – XML slang • A METS Document can be – A single file with all content & metadata – A “hub document” that points to content and metadata – A combination of the above Besser & Hurley 10/10/02 21
Uses of METS • Transfer Syntax – Standard for transmitting/ exchanging digital objects. – SIP (Open Archival Information Systems Reference Model) • Functional Syntax – basis for providing end users with the ability to view and navigate digital content and its associated metadata – DIP • Archiving Syntax – standard for archiving digital objects. – AIP Besser & Hurley 10/10/02 22
Why Is METS Important? • Interoperability – Share objects between digital library systems – Allow a DL to work with objects from other repositories • Scalability – Same software can be used to index, navigate and display different content types • E. g. , book, diary, scrapbook, music score, etc. • Preservation – Aids Migration Strategies Besser & Hurley 10/10/02 23
History of METS • Originates in Making of America II Initiative – Making of America II (MOA 2) was a NEH funded Digital Library Federation initiative started in 1997. Participants included UC Berkeley (lead), Stanford, Penn State, Cornell, and NYPL. – GOAL: to create a digital object standard for encoding structural, descriptive and administrative metadata along with primary content – RESULT: MOA 2. DTD (an XML DTD) • Adopted by UC Libraries Besser & Hurley 10/10/02 24
History of METS (cont’d) • Concerned Parties Meet at NYU in February, 2001 to Discuss Future of MOA 2 – Additional needs emerge • Support for time-based content • More flexibility in Descriptive and Administrative metadata – Outcome • MOA 2 revised & renamed to METS • Outcome: mets. xsd is endorsed by DLF • METS Governance Structure – Editorial Board, Jerry Mc. Donough is Chair • RLG coordinates editorial board activities • Library of Congress is the Maintenance Agency for METS Besser & Hurley 10/10/02 25
A Partial List of Organizations that Plan to Use METS • California Digital Library • UC Berkeley • Library of Congress (A/V project) • Harvard • NYU • Stanford • MIT • Meta. E (Metadata Engine Project: R&D project funded by the European Commission) • British Library Besser & Hurley 10/10/02 26
How Does METS Work? • METS uses XML to 1) Identify the digital pieces (files) that together comprise a digital object • Scrapbook: Digitized pages, photographs, newspaper clippings, digital audio, etc. 2) Specify the location of these pieces • Are we pointing to these files? • Are they embedded in the METS document? • A combination of the above? Besser & Hurley 10/10/02 27
3) Express structural relationships between: [Think of the “structure” as a “Table of Contents”] • Content files – Links the proper content files to the TOC entry for the scrapbook’s cover, page 1, page 2, the photo on page 20, the DVD on page 50, etc. • Descriptive Metadata (DM) – Links the proper DM entries to the TOC, so you can have separate DM entries for the scrapbook, photos, audio DVDs… • Administrative Metadata (AM) – Links AM entries to the TOC or to files (e. g. , links rights MD to a photo, Tech. MD to a group of files) • Behaviors – Links the proper behaviors to TOC entries (e. g. , links program to run the audio to the DVD TOC entry) Besser & Hurley 10/10/02 28
Anatomy of METS File METS Descriptive Admin. Structural Behavior Header Metadata Inventory Map Metadata (Optional) (Optional, but typical) (Required) (Optional) Besser & Hurley 10/10/02 29
1. METS Header • Records Administrative Metadata about the METS Document itself, such as – Author/agent & agent role • E. G. , UC Berkeley Library as custodian – Alternate identifiers for METS document – Creation and updates and times – Status Besser & Hurley 10/10/02 30
2. Structural Map Section(s) • Specifies the Structure of the Digital Object as a Hierarchy of Division (div) Elements Division (type=“scrapbook”) Division (type=“page”) Division (type=“photo”) Division (type=“digital audio file”) Division (type=“page”) Division (type=“letter”) Division (type=“photo”) Division (type=“newspaper clipping”) Besser & Hurley 10/10/02 31
METS Scrapbook Example METS Header Div (cover) fptr Div (page) fptr Div (photo) fptr Div (DVD) fptr Structural Map
3. File Section • Records all of the Files that Together Comprise the Content of the Digital Object – Files may be internal or external to the METS document (or both) • Files are organized into File Groups based on format (tiff, hi-res jpeg, med-res jpeg, gif, etc) • Files are linked to the Structural Map Besser & Hurley 10/10/02 33
3. File Section (cont. ) • Scrapbook Example (a complex object) – 100 Digitized pages with text entries • Three images per page (GIG, JPEG, TIFF) • Transcribed text for each page – Photos and newspaper clippings attached to the pages – Envelopes glued to the pages that hold • Letters & cards • DVDs Besser & Hurley 10/10/02 34
METS Scrapbook Example File Group file METS Header Div (cover) fptr Div (page) fptr Div (photo) fptr Div (DVD) fptr File Section Structural Map
4. Descriptive Metadata Section(s) • METS can Record all of the Units of Descriptive Metadata Pertaining to the Digital Object – Multiple Descriptive Metadata Sections can Exist in a METS Document • Descriptive Metadata – could take any form • E. g. , a MARC or Dublin Core record, Finding Aid – May be • Internal or external to the METS document (or both) Besser & Hurley 10/10/02 36
METS Scrapbook Example File Group file METS Header Div (cover) fptr Div (page) fptr Div (photo) fptr Div (DVD) fptr File Section Structural Map Descriptive MD Sections Des MD
5. Administrative Metadata Section(s) • 4 Flavors of Admin. Metadata Per Section – Technical metadata – Source Metadata – Rights Metadata – Digital Provenance Metadata • Admin. Metadata may be – Internal or external to the METS document (or both) – Linked to files or file groups, or the structural map Besser & Hurley 10/10/02 38
METS Scrapbook Example File Group file Admin MD Section Tech MD Source MD Dig Prov MD Rights MD Div (cover) fptr Div (page) fptr Div (photo) fptr Div (DVD) fptr File Section METS Header Structural Map Descriptive MD Sections Des MD
6. Behavior Section • Behavior Sections Identity Software that can be used with the Digital Object, or its Parts – E. g. , Software to View the Complex Digital Object which is the Scrapbook; Software to listen to the DVD • A Behavior Unit May Contain: – A reference to an external interface definition that defines a set of related behaviors – A reference to an external executable that implements these behaviors – A reference to the Division or Divisions of the object structure to which the behaviors apply. Besser & Hurley 10/10/02 40
METS Scrapbook Example File Group file Admin MD Section Tech MD Source MD Dig Prov MD Rights MD Behavior MD Section(s) Behavior MD Div (cover) fptr Div (page) fptr Div (photo) fptr Div (DVD) fptr File Section METS Header Structural Map Descriptive MD Sections Des MD
Content Format Standards (Images) Besser & Hurley 10/10/02 42
Images • Content Format & Best Practices • Identification/Provenance • Technical Imaging metadata • Special discovery metadata Besser & Hurley 10/10/02 43
Best practices Use/Users/Collection: Benchmarking Masters vs. Derivatives Scanning Administrative Metadata Structural Metadata- Besser & Hurley 10/10/02 44
Scanning Best Practices • • Think about users (and potential users), uses, and type of material/collection Scan at the highest quality that does not exceed the likely potential users/uses/material Do not let today’s delivery limitations influence your scanning file sizes; understand the difference between digital masters and derivative files used for delivery Many documents which appear to be bitonal actually are better represented with greyscale scans • • • Include color bar and ruler in the scan Use objective measurements to determine scanner settings (do NOT attempt to make the image good on your particular monitor or use image processing to color correct) Don’t use lossy compression Store in a common (standardized) file format Capture as much metadata as is reasonably possiple (including metadata about the scanning process itself) Besser & Hurley 10/10/02 45
Why Scale is important Besser & Hurley 10/10/02 46
Identification/Provenance (Images) The number of variant forms of a work can be enormous Image Families A digital image frequently has many layers of parentage Information about the parentage that can indicate the quality and veracity of the image (Dublin Core "Source" and "Relation") how to deal with different versions derived from the same scan or different encoding schemes Vocabulary Standards to express this Besser & Hurley 10/10/02 47
The number of variant forms of a work can be enormous different views of the same object different scans of the same photo different resolutions different compression schemes different compression ratios different file storage formats different details of the same image . . . Besser & Hurley 10/10/02 48
Image Families
Identification/Provenance how to deal with different versions (browse, hi-res, medium res) derived from the same scan or different encoding schemes (TIFF, PICT, JFIF) Vocabulary Standards to express this – – VRA Surrogate Categories CIMI's "Image Elements” Besser & Hurley 10/10/02 50
Incorporate parts of Functional Requirements for Bibliographic Records (FRBR) • work • expression • manifestion • item • (and push into “change history” section of Technical Image Metadata) Besser & Hurley 10/10/02 51
NISO/DLF Technical Image Metadata Workshop--4/99 (Z 39. 87 -2002 draft) create metadata needed to manage images in digital repositories over long periods of time (full life-cycle mgmt) document image provenance & history ensure that the images will be rendered accurately on any output device Besser & Hurley 10/10/02 52
Technical Image Metadata Focus on Metadata that may prove helpful for management use preservation . . . Besser & Hurley 10/10/02 53
Technical Image Metadata In Scope still, bit-mapped pictorial images scanned/reformatted images (+ born digital) Besser & Hurley 10/10/02 54
Technical Image Metadata Out of Scope vector images moving images of OCR-able text structural and hierarchical relationships between images rights management, terms of use (authenticity/security) Besser & Hurley 10/10/02 55
Technical Image Metadata -Z 39. 87 Image parameters (MIME type, compression, colorspace & profile, …) Image Creation (source, capture info, etc. ) Image performance assessment (sampling, colormap, whitepoint, target data, etc. ) Change history (source, processing, etc. ) Besser & Hurley 10/10/02 56
Technical Image Metadata -Z 39. 87 additional XML implementation schema (MIX) Besser & Hurley 10/10/02 57
Other Metadata • Description of depiction/surrogate (What VRA calls its "Surrogate Categories") • Description of original object • Rights and Reproduction Information • Location Information • VRA Core, LCSH, TGM, AAT, ULAN, TGN, DOI, <indecs>, . . . Besser & Hurley 10/10/02 58
A Conceptual Digital Library Model for the New Information Environment Besser & Hurley 10/10/02 59
Building the National Digital Library: A Labor of Love • Rigorous Proof for the Premise – Librarians Will Never Make as Much Money as Executives • Postulates – #1: Knowledge is Power – #2: Time is Money Besser & Hurley 10/10/02 60
Four Step Proof 1) We know that Work = Power x Time 2) Since Knowledge = Power & Time = Money, we can substitute these into the above equation to get: Work = Knowledge x Money 3) Solving for Money gives us: Money = Work / Knowledge 4) Therefore, as Knowledge Approaches Zero, Money Approaches Infinity!!! Besser & Hurley 10/10/02 61
The New Information Environment • This Is Actually The Second Info. Revolution – In 1450 Guttenberg’s printing press created a portable information technology -- The Book • The Current Revolution – Being driven by the convergence of electronic information systems and the emerging international communications network – The Internet and the Web are fundamentally changing Information Seeking Behavior – Information is becoming “ultimately portable” Besser & Hurley 10/10/02 62
Information Seeking Behavior Traditional Library Services Network Based Services The Web Historical Record; Under Library Control Current materials; Trade, government & community Info. ; Full text, multimedia; Less local Besser & Hurley 10/10/02 63 control
Some Characteristics of the New Information Environment • Increased Quantity of Information – With the Web, everyone can become a publisher – Varying level of quality • Digital Libraries Need to Work With New Classes of Information – Web Pages, Museum Artifacts, GIS, Statistical Information, etc. Besser & Hurley 10/10/02 64
Characteristics of the New Information Environment (Cont. ) • Information is Decentralized – Distributed repositories • Information is in Proprietary Formats – Everyone has their own method of creating a digital book, journal, manuscript, Etc. How Do We Cope? ? Besser & Hurley 10/10/02 65
Defining Digital Libraries in the NIE • A Series of Collaborating Services & Systems that Allow for the Discovery, Display, Maintenance and Preservation of Complex Digital Objects – The Traditional ILS • Created to manage physical materials • Almost all metadata is descriptive (e. g. , MARC) – Digital Libraries • Created to manage complex digital objects • New types of metadata (administrative, structural, etc. ) • New Services (content management, digital preservation) Besser & Hurley 10/10/02 66
Complex Digital Objects • Scrapbook Example – Digitized pages with text entries – Photos and newspaper clippings attached to the pages – Envelopes glued to the pages that hold • Letters & cards • DVDs • The Scrapbook has – Multiple material types (text, image, audio) – Structure (e. g. , like a table of contents) – Internal Relationships • The DVD on page 5 is linked to the file that is the DVD content and to its descriptive metadata Besser & Hurley 10/10/02 67
“A Series of Collaborating Services” • Content Management Systems (CMS) – Create & maintain complex digital objects • Preservation Repositories – Long-term retention of digital objects • Access Systems & Integration – Global Access Portals – Subject Access Portals – Material Type Portals Besser & Hurley 10/10/02 68
How Can These Systems Collaborate? • Via “Standardized Digital Objects” – A means to “wrap-up” a digital object and send it to another system or repository • Same idea as MARC, but for entire digital objects • E. g. , A CMS sending a digital object to a Preservation Repository • The METS Digital Object Standard – Metadata Encoding and Transmission Standard Besser & Hurley 10/10/02 69
Illustrative Digital Library Services Diagram Global Access Portal Material Type Portal [books] Material Type Portal [images] Material Type Portal [fossils] METS Content Management Preservation Repository METS Preservation Repository Content Management METS Besser & Hurley 10/10/02 70
Content Management Systems Besser & Hurley 10/10/02 71
Content Management Systems • Used to… – Create and edit digital objects – Import & export digital objects – Manage objects (acquire, inventory, validate) • Content Management Systems will Vary Depending on the Materials they Support – Metadata schemes will vary • Descriptive Metadata – MARC/MODS/Dublin Core for Books – Code books for numeric datasets • Administrative Metadata – Images, audio, test, etc. Besser & Hurley 10/10/02 72
Gen. DB: UC Berkeley’s METS CMS • METS is too complicated to create documents “by hand” • Gen. DB is a Web-based software tool that can record digital content, related metadata, and complex relationships between all of the digital pieces comprising a digital object Besser & Hurley 10/10/02 73
Besser & Hurley 10/10/02 76
Gen. View -- METS Viewer Example • Shows ability of METS to specify digital content, related metadata, and complex relationships between all of the digital pieces comprising a digital entity – Examples actually MOA 2 based; but could be METS Besser & Hurley 10/10/02 79
Longevity & Preservation Repositories Besser & Hurley 10/10/02 83
Digital Preservation The Problem Preservation Repositories Preservation Metadata Other Digital Preservation Activities Special concerns of Cult Heritage community Besser & Hurley 10/10/02 84
Serious Longevity Problems What we know from prior widespread digital file formats Previous formats required little ongoing intervention (remote storage facilities, Iron Mtn); digital formats require intense ongoing management The Short Life of Digital Info- Besser & Hurley 10/10/02 85
The Short Life of Digital Info: Digital Longevity Problems Disappearing Information The Viewing Problem The Scrambling Problem The Inter-relation Problem The Custodial Problem The Translation Problem Besser & Hurley 10/10/02 86
The Viewing Problem Digital Info requires a whole infrastructure to view it Each piece of that infrastructure is changing at an incredibly rapid rate How can we ever hope to deal with all the permutations and combinations Besser & Hurley 10/10/02 87
The Scrambling Problem Dangers from: Compression to ease storage & delivery Container Architecture to enhance digital commerce Besser & Hurley 10/10/02 88
The Inter-relation Problem -Info is increasingly inter-related to other info -How do we make our own Info persist when it points to and integrates with Info owned by others? -What is the boundary of a set of information (or even of a digital object)? Besser & Hurley 10/10/02 89
The Custodial Problem In the past, much of survival was due to redundancy How do we decide what to save? Who should save it? Mellon-funded E-Journal Archives How should they save it? - Besser & Hurley 10/10/02 90
The Custodial Problem: How to save information? Methods for later access Refreshing Migration Emulation Issues of authenticity and evidence Besser & Hurley 10/10/02 91
The Translation Problem Content translated into new delivery devices changes meaning – – – -A photo vs. a painting -If Info is produced originally in digital form in one encoded format, will it be the same when translated into another format? Behaviors Besser & Hurley 10/10/02 92
Older Longevity Projects http: //sunsite. berkeley. edu/Longevity/ CPA Task Force Getty “Time & Bits” Conference & Follow-ups Preservation experiments in US and Europe NEDLIB, CURL, Michigan Internet Archive Long Now Besser & Hurley 10/10/02 93
Preservation Repositories: Projects based on OAIS Model CEDARS NEDLIB Pandora CDL OCLC/RLG Working Group on Preservation Metadata, Attributes of a Trusted Digital Repository, August 2001 - Besser & Hurley 10/10/02 94
Preservation Metadata OCLC/RLG Working Group on Preservation Metadata, Preservation Metadata for Digital Objects: A Review of the State of the Art, January 31 2001 OCLC/RLG Working Group on Preservation Metadata, A Recommendation for Content Information, October 2001 Besser & Hurley 10/10/02 95
Preservation Repositories: Open Archival Info System Model Consumer Producer Management Besser & Hurley 10/10/02 96
Preservation Repositories: Open Archival Info System Model High-level reference model describing submission, organization and management, and continuing access Conceptual framework for different organizations to share discussions with a common language Producers, consumers, management, actual repository SIP, DIP, AIP consists of data objects plus representation info (Content, Preservation Description, Packaging, Descriptive) Originally developed for Space Science community Besser & Hurley 10/10/02 97
Preservation Repositories -- AIP Metadata • Preservation Description Info – reference info – context info – provenance info – fixity info • Packaging Info • Descriptive Info • Content Info Besser & Hurley 10/10/02 98
OCLC/RLG Digital Repository Attributes • Administrative responsibility • Organizational viability • Financial sustainability • Technological suitability • System security • Procedural accountability Besser & Hurley 10/10/02 99
OCLC/RLG Selected Recommendations • Policies, Certification processes, Risk • • • management, Persistent ID, Migration/Emulation experiments Stakeholders meet to decide how to describe what is in a dig repository Examine special properties of particular classes of digital objects Technical standards for exchange and interoperability btwn repositories Develop projects and case studies Copyright issues Besser & Hurley 10/10/02 100
Other Digital Preservation Activities LC Natl Dig Info Infrastructure & Preservation Inter. PARES Emulation Projects E-Journal Archiving ERPANET Persistent Naming Besser & Hurley 10/10/02 101
LC’s National Digital Information Infrastructure and Preservation Program • Authorized Dec 2000 • LC, Dept of Commerce, NARA, White House Office of Sci & Tech Policy • with help from CLIR, NLM, NAL, OCLC, RLG • Ongoing collab process • Commissioned papers on preserving: the Web, periodicals, digital sound, E-Books, Digital TV, Digital Video Besser & Hurley 10/10/02 102
Inter. PARES International Research on Permanent Authentication Records in Electronic Systems • Ongoing international archival world project examining how to make electronically-generated records last over time • Developing theoretical and methodological knowledge needed, then will formulate model policies, strategies, and standards • Next year will be extended to include images and rich media Besser & Hurley 10/10/02 103
Emulation Projects • CAMi. LEON (Michigan/Leeds) • NEDLIB Besser & Hurley 10/10/02 104
E-Journal Archiving • Issues – – – License, don’t own; may not be even able to obtain right to make archival copy Increasingly no paper back-up at all Usually we don’t have the important redundancy factor • Mellon funded projects (2001) – – Yale, Harvard, Penn working w/individual publishers Cornell, NYPL--specific disciplines MIT exploring characteristics that change (dynamic) Stanford--archiving software tools Besser & Hurley 10/10/02 105
Electronic Resource Preservation and Access NETwork (ERPANET) • Best practices and skills development for digital preservation of cultural heritage and scientific objects • 3 year project launched Nov 2001; 1. 2 million Euros Besser & Hurley 10/10/02 106
Persistent Naming URNs Handles PURLs Re-directs Besser & Hurley 10/10/02 107
Access Systems & Access System Integration AIM The Access Integration Model Besser & Hurley 10/10/02 108
AIM Assumptions • The large and increasing number of portals 1 pose a serious problem for library users – Where do they start looking? – UC campuses have • Melvyl, the CDL Directory, Searchlight, the OAC • Local campus catalogs, databases & websites • Many other websites to access e-journals, abstract & indexing databases, government information, etc. 1 Note: The terms portal, access systems are used interchangeably Besser & Hurley 10/10/02 109
AIM Assumptions • People work at different levels of sophistication, based on their current “information need” – Need a few good pages on T-Rex – Need to do comprehensive research on dinosaur fossil specimens Besser & Hurley 10/10/02 110
AIM Assumptions • It’s not possible to build a single access system that “does everything for everybody” – The shear number of portals created bears this out – The library catalog as an example…. • Privileges books (over music, numeric datasets, etc. ) – By the shear number of books • Powerful, but complicated – Non-librarians do not use advance search features • Imagine adding other types of non-print materials – Paleontology specimens searched by kingdom, phylum, class, order family, genus, species Besser & Hurley 10/10/02 111
The AIM… • Envisions a network of cooperating portals each with its own goals and responsibilities – Provides a framework to help decide: • Which access systems are needed; and • How they should be integrated together – Is a Reference Model • Focus discussion on what systems should do; and what metadata is needed to support that functionality • Helps avoid the temptation to immediately map our existing systems into the model – Think about what we want, not what we have Besser & Hurley 10/10/02 112
Material Types Defined • Material Types describe categories of digital content – Books, maps, images, GIS files, etc. • The AIM does not predefine material types – However, any AIM-compliant implementation should predefine the material types it will use. • Material Types are used to report search results to users – (e. g. , 1, 200 books, 35 maps, 120 images, 12 numeric datasets) Besser & Hurley 10/10/02 113
Why Report Searchs by M-Type? • Prevents categories with smaller number of responses (e. g. , the 4 maps) from getting lost in large search result sets • Minimizes the problem of having the same object appear in multiple search result groupings – Carefully predefining types, such as books and maps, helps to ensure that objects are classified under a single material type. • Can be “economical” – Using existing metadata (e. g. , MARC Format, Dublin Core Type), whenever possible, means that we do not have to bear the cost of creating new material type metadata • Most importantly, it is an approach that our users will understand Besser & Hurley 10/10/02 114
The AIM Portals • Material Type Portal – Designed for experts • Provides sophisticated search and display services for specific material types (e. g. , books, GIS, images…) • Global Access Portal (GAP) – A “scholar’s portal” or “Google-like” service – Designed for users w/simpler information needs • Subject Access Portal – A version of the GAP, which reports back search results grouped by material type in predefined subject areas Besser & Hurley 10/10/02 115
The Material Type Portal (MTP) • Designed for More Expert Users – Advanced Searching • Customized to the metadata for that material type – Advanced Display • More than you can do in a browser – Rendering GIS maps, complex objects – Advanced object manipulation • Numeric datasets, image manipulation tools • Possible Examples – Luna for images; GIS systems; OAC for Finding Aids; Gen. View for navigating complex METS objects – Library catalogs? (at least for books & journal titles) Besser & Hurley 10/10/02 116
Global Access Portal (GAP) • “Scholar’s portal” or “Google-like” Service – keyword searching & phrase searching only • Metadata fed from Material Type Portals – reports search results grouped by material type • E. g. -- 1, 200 books, 35 maps, 120 images, 12 video files – This is unlike Google, which only needs to report search results for one material type -- web pages • Designed for users – Who need a place to start – With simpler information needs – Who want to a more “interdisciplinary” search • Books on Navajo Baskets from the Library; Photos from a library archival collection, images from a museum Besser & Hurley 10/10/02 117
Global Access Portal (GAP) • Give the GAP a Break! – Think Google – It’s not designed to be super accurate • That’s the job of the Material Type Portal – Users searching the GAP can be made aware a Material Type Portal exists. • They can then make an informed decision on whether they want to spend the time to learn the Material Type Portal interface. Besser & Hurley 10/10/02 118
The Subject Access Portal (SAP) • Same as the GAP in that it reports search results by material type • Different in that it only reports results for a given subject area – Assumes the metadata is available to classify digital objects as belonging to a subject area Besser & Hurley 10/10/02 119
The AIM Portals Global Access Portal Subject Access Portal MC MTP MTP MC MC MTP=Material Type Portal MC = Metadata Catalog
How do the Portals Work Together • Integration at two level, Discovery & Display • Discovery – Experts can go directly to Material Type Portals – Need a place to start? Go to the GAP or SAP! • The GAP can tell a user that the a search result came from a Material Type Portal and provide a link to the MTP • Display – The Global and Subject Access Portals can Material Type Portals to display complex objects Besser & Hurley 10/10/02 121
User searches on “Napa Valley”…. GLOBAL ACCESS PORTAL Your search for “Napa Valley” found 138 Items 72 Books 21 Maps 8 Manuscripts 1 Video 30 Websites 5 Journal Titles 1 Image Then, clicks on the “Maps” link…. GLOBAL ACCESS PORTAL 21 Maps responded to your search for “Napa Valley” 1. Napa, American Canyon and vicinity Author: California State Automobile Association. Cartographic Dept. Published: San Francisco, c 1999. Holdings: Earth Sci G 4364. N 2 1999; . C 3; Case B 2. Cycler's road map of part of the Sacramento Valley and Vicinity [including Colusa, Yolo, Napa, Butte, Yuba, Sutter, Solano and Sacramento counties Author: Blum, George W. Published: [San Francisco : The Author, 1896, c 1895] 3. Blah, blah, Napa Valley, blah…. Then, clicks on the link for the Cycler’s Road Map….
Which, opens a new window for a Material Type Portal. In this case it’s the David Rumsey Map Collection running on Luna’s Insight Software.
AIM and Internet Content • Internet content is problematic – We don’t control the metadata needed to build metadata catalogs – In some cases we can harvest (e. g. , OAI) • E-journals, as an example – Article level metadata all over the Web & controlled by many different publishers – If we wanted e-journals as a material type, the AIM would suggest creating a metadata catalog – Solutions? • Convince publishers to give is the metadata (not likely) • Convince publisher to create a metadata catalog for us – Possible, Cross. Search from Cross. Ref? Besser & Hurley 10/10/02 124
Other Elements • Privacy • Actors Metadata • Preserving Electronic Art • Copyright Dangers Besser & Hurley 10/10/02 125
Creating a Library Systems Privacy Policy Besser & Hurley 10/10/02 126
UC SOPAG Privacy Task Force • Charged with Developing a “Model Policy on Privacy for Library Provided Digital Services“ • Final Report and [GREAT!] Supporting Website – Linked to from the UC Libraries Systemwide Operations and Planning Advisory Group (SOPAG) website http: //www. slp. ucop. edu/sopag/ Besser & Hurley 10/10/02 127
Creating a Library Systems Privacy Policy • Steps – Perform a Privacy Audit – Review laws and policies (Federal, State, University) – Determine the desired local policy for each library function – Make sure the library's practices follow the policy – Create a privacy statement for users of the library's services and post it prominently on library web pages – Determine a date when the privacy policy will be reviewed Besser & Hurley 10/10/02 128
Privacy Audit • Library Applications Systems – Integrated Library Systems, Electronic Reserves, Electronic Reference, Staff Directories, etc. • Matrix: For Each “Location of Private Information” Determine… – Minimum Practices to Meet Legal or Policy Requirements – Additional Practices to Consider – Relevant Legislation and Professional Codes Besser & Hurley 10/10/02 129
Locations of Private Information • Library Application Systems – Integrated Library Systems, Electronic Reserves, Electronic Reference, Staff Directories, etc. • Library Server Logs – Web Servers, e-mail • Library Public Workstations – Caches, cookies & certificates, OS logs, browser bookmarks • Network Services – Router/switch logs • Licensed Services – Content provider policies, personalization services Besser & Hurley 10/10/02 130
Location of Private Information Minimum Practices to Meet Legal or Policy Requirements Additional Practices to Consider Relevant Legislation and Professional Codes Library Application Systems (Integrated Library Systems, Electronic Reserves, Electronic Reference, Staff Directories, etc. ) Circulation and borrower records, including: · Patron registration records[i] · Circulation transaction logs · Overdue and billing records · Records of paging from RLFs and local storage · Document delivery & interlibrary loan transactions · Records of access to electronic reserves Restrict access to records and logs that reveal what was borrowed by a patron, as well as to patron registration records, to library staff who have a legitimate need to see the records. Don’t allow access to records or logs that reveal what was borrowed, or to patron registration records, to nonlibrary personnel without proper written authorization from the patron or by court order. Delete patron registration records 0 -5 years after expiration of borrower privileges. Be aware that the amount of fines or fees owed by a patron may be shared with other campus systems and is subject to the California Public Records Act. Delete individual identity as soon as possible after a transaction is resolved (that is, when the item has been returned, all bills are paid, etc). For statistical analysis, preserve only the category of user. Keep billing information only as long as required by campus financial record policies. Post on the library Web site what information you keep about patron borrowing histories, how long it is kept, and who can see it. Ca. Government Code Section 6254 (j)[ii] Ca. Government Code Section 6267 [iii] Ca. Constitution Article 1 [iv] ALA Code of Ethics 54. 15 pt. 3 [v] SAA Code of Ethics Section IX [vi] UC Records Management Disposition Schedules (Library Records) [vii] Note: Records of fines are specifically exempt from Ca. Government Code Sectio 6254(j) and 6267
Location of Minimum Practices to Additional Practices Relevant Legislation and Private Meet Legal or Policy to Consider Professional Codes Information Requirements Library Application Systems (Integrated Library Systems, Electronic Reserves, Electronic Reference, Staff Directories, etc. ) Records to support personalized services, including: · Search histories saved beyond a session · Saved searches and sets · SDI profiles · Files/logs of previous electronic reference queries and answers OPAC search logs (see also Web server logs below) Notify users whenever personally identifiable information (such as name, user ID, email address, etc) is requested and will be stored on the system. Restrict access to personally identifiable records to library staff who have a legitimate need to consult. Don’t provide personally identifiable records to a third party without the explicit permission of the patron or by court order. Restrict access to search logs to library staff who have a legitimate need to see the records. If individual identity is logged, have an online notice advising users that such records exist and privacy can’t be guaranteed. Advise users of the privacy exposures involved in providing information to support personalized services. Offer users the option of more limited services, without the need to provide personal information Regularly purge unused records with personally identifiable information. If reference answers are kept in a “knowledge bank” for reuse, delete information on the asker before saving. UC Electronics Communications Policy Section IV. B. C. [i] [Superseded by the UCECP above] FERPA 20 USC 1232 g [ii] Ca. Information Practices Act. Ca. Civil Code Section 1798 et seq. [iii] Ca. Government Code Section 11015. 5 [iv] ALA Code of Ethics Section III [v] Log only aggregate information about users if possible. If individual identity is logged, delete individual information as soon as possible, keeping searching information by category of user only for statistical — ALA Access to Electronic Information, Services and Networks: An Interpretation of the Library Bill of Rights. [vi]
http: //www. delos-nsf. actorswg. cdlib. org/ DELOS/NSF Working Group Reference Models for Digital Libraries: Actors and Roles Besser & Hurley 10/10/02 133
NSF/DELOS Actors/Roles Project • Classes of Actors, including – – – Persons Organizations automata • Roles & implications – – Production Dissemination Management use Besser & Hurley 10/10/02 134
Multimedia & Collaborative Authorship imply • Not only: – – – Authors Editors Publishers • But also creators of – – Text Illustrations Composers Musicians. . . Besser & Hurley 10/10/02 135
And goes beyond conventional authors • Others that are part of digital library process – – – Users Catalogers Reference librarians • Even other groups/entities – – – Software agents Mediators Special rights holders. . . Besser & Hurley 10/10/02 136
Digital Library Borbinha’s “naive tentative sketch” of the problem. . . Publication Licensing Acquisition Dissemination Registration Search Agent Creator Distributor Editor Access User Librarian Registered Anonymous Preservation
Benefits for • Linking metadata to authority records • Rights management • Privacy protection Besser & Hurley 10/10/02 138
Deliverables • Workshop proceedings: proceedings with invited contributions and papers selected from a call, intended to be a reference source for the current state of the art. • White paper: – Definition and introduction to the problem. – Description and analysis of the requirements. – A proposal to the community for a reference model, focusing on definitions of key concepts, terminology, classes of agents, services, relationships, etc. – Proposals for an international agenda for further technical and collaborative developments. Besser & Hurley 10/10/02 139
Core group DELOS (Europe) NSF (USA) • José Borbinha, National Library of • John Kunze, University of • • • Portugal (DELOS coordinator) Michel Mabe, Elsevier Science, UK (Publishing industry) Peter Mutschke, Social Science Information Centre, Germany (Software agents, Information Retrieval) Hans-Jörg Lieder, Berlin State Library, Germany (LEAF project) Gunnar Karlsen, University of Bergen, Norway (Archives) WIPO – World Intellectual Property Organisation • Glenn Macstravic • • • California, USA (NSF coordinator) Barbara Tillett, Library of Congress, USA (Libraries) Becky Dean, OCLC, USA (Libraries services) Angela Spinazze, CIMI/RLG, USA (Museums) Howard Besser, University of California, USA (Multimedia and digital art production) DCMI - Dublin Core Metadata Initiative • Warwick Cathro, National Library of Australia Besser & Hurley 10/10/02 140
Work plan Phase 1: Starting (March - April 2002) • Tuning objectives, scope, and action plan • Identification of reference sources • Call for contributions to the workshop Phase 2: Internal Discussion (May - June 2002) • Analysis of the problem • Draft paper Phase 3: Public Discussion (July - October 2002) • Expose the draft paper. Promote open public discussion • Workshop in Portugal (July 3 -5). Workshop report • Draft paper (second version) Phase 4: Conclusions (November - December 2002) • Review of the work done. . . • Final report Besser & Hurley 10/10/02 141
. . . Actors and Roles ? ? ? Besser & Hurley 10/10/02 142
What’s special about Cult Heritage Materials? • Images & rich media • Inter-relationships btwn parts • For Contemporary Art: What is the Work? - Besser & Hurley 10/10/02 143
Le. Witt: Wall Drawing 340 Besser & Hurley 10/10/02 144
Installing Le. Witt Besser & Hurley 10/10/02 145
Le. Witt Install Directions Besser & Hurley 10/10/02 146
Complexity of Rich Media • Works often have artistic nature (including video games) • Enormous number of elements can, at times, be very important to preserve (pacing, original artifact, elements used to construct the artifact) • Too complex to save every one of these aspects for every type of material • Importance of saving documentation Besser & Hurley 10/10/02 147
What can we do specific to Electronic Art? • • Works themselves may no longer even exist; in many cases, what we can save amounts to forensic evidence Enormous number of elements can, at times, be very important to preserve (pacing, original artifact, elements used to construct the artifact) Too complex to save every one of these aspects for every type of material Importance of saving pieces, representations, and documentation Involve the artists to capture their intentions Importance of Standards Familiarize ourselves with recent conservation developments (Who Knows? , Tech. Archeology, Tate, IMAP) Besser & Hurley 10/10/02 148
Standards for encoding artists intentions (group efforts w/i Cult Heritage community) • Artists Interviews Project, Netherlands • • Institute for Cultural Heritage 1998 -1999, Modern Art: Who Cares (http: //www. icn. nl/english/6. 4. 2. html) Tech. Archeology: A Symposium on Installation Preservation (SFMOMA) More recent SFMOMA/Tate collaborations IMAP Guggenheim’s Variable Media Besser & Hurley 10/10/02 149
Structural Metadata Standards for Encoding Multimedia(no time for details) • SMIL • MPEG 4 Besser & Hurley 10/10/02 150
A few questions our community should address • • Special issues raised by non-library institutions Special issues raised by images and rich media What is the work (or salient points we need to preserve)? Bring the arts communities (artist intent, BAVC) together with the preservation repository communities and the preservation metadata communities Specifically get Cult Heritage communities involved with the selected OCLC/RLG recommendations Get cult heritage groups started on working to make sure that structure standards incorporate our works What organizations will take responsibility to save today’s digital “ephemeral” materials (online ‘zines, arts discussion groups, etc. )? Besser & Hurley 10/10/02 151
Copyright Dangers Besser & Hurley 10/10/02 152
Digital Repository Traditions & Services require Sustainability Interoperability Access And all of these require Standards and Metadata Besser & Hurley 10/10/02 153
Building a Digital Future: Sustainable, Interoperable, Accessible Repositories Howard Besser, NYU Archiving & Preservation Program Bernie Hurley, UC Berkeley Library • • • http: //www. firstmonday. dk/issues/issue 7_6/besser/ Baca, Murtha (ed). Introduction to Metadata, Los Angeles: Getty Information Institute, 1998 http: //www. getty. edu/gri/standard/intrometadata/ http: //www. gseis. ucla. edu/~howard/Metadata/UC-May 00/ http: //sunsite. berkeley. edu/Metadata/sp 2000. html http: //sunsite. berkeley. edu/Longevity/ http: //www. oclc. org/digitalpreservation/presmeta_wp. pdf http: //is. gseis. ucla. edu/us-interpares/ http: //www. niso. org/commitau. html http: //www. ifla. org/II/metadata. htm METS official site: http: //www. loc. gov/standards/mets UC Libraries Systemwide Operations and Planning Advisory Group (SOPAG) Site http: //www. slp. ucop. edu/sopag/ for the UC Digital Preservation & Archiving Committee Final Report, the Access Integration Model white paper and the Library Services Privacy report
- Slides: 154