WHERE HAVE ALL THE BINDERS GONE Greg Colati
WHERE HAVE ALL THE BINDERS GONE? Greg Colati, University of Denver Jennifer King, George Washington University Sylvia Augusteijn, George Washington University SAA Chicago Session #801 September 1, 2007
WHY MANAGE WITH A DATABASE? Scale Centralized management Access Reusability Rearrange-ability
REAL DRIVERS OF CHANGE Demand for item level access Born Digital content Digitized content Researcher demands and expectations
MANY INPUTS, MANY OUTPUTS In-house cataloging or imported metadata Metadata from Records Management system Metadata from local systems e. g. DUVAGA, or IR Physica l object Storage location Digital object Storage location Collections Management Database OAI metadata for harvesters and aggregators Metadata for local Systems: e. g. Heritage West, Penrose web. DUVAGA EAD XML for RMOA or other uses MARC records for III or other uses
OBJECTS AND ATTRIBUTES I belong to a collection I belong to a series I came from somewhere I am an image I am a certain file format(s) I am about something(s) I am green, blue, and brown
Project Ungava, National Research Council of Canada CLUSTERING
Bungee View: http: //cityscape. inf. cs. cmu. edu/bungee/ VISUALIZATION
The Encyclopedia of Chicago http: //www. encyclopedia. chicagohistory. org/ © 2007 Gregory C. Colati CONTEXTUALIZE THE RESOURCE
I WANT WHAT I WANT …
A CULTURAL SHIFT General Object Specific Association
EXTEND INTEROPERABILITY Descriptive standards at the item level
MANAGE FROM THE BOTTOM UP Items and attributes Create associations, implicit and explicit
PRODUCTIVITY APPROACH TO PROCESSING, MANAGEMENT, AND ACCESS Automate metadata creation Metadata extraction Pre-populate metadata fields using default and automatically generated terms Stop writing extensive biographical and historical notes Automate digital content creation
USE THE POWER OF DATABASE TOOLS Ingest tools discussed above Export templates for: MARC EAD Various XML schemas for item level export: MARCXML, DC, TEI, VRA etc. Metadata from Records Managemen t system In-house cataloging or imported metadata Metadata from local systems e. g. DUVAGA, or IR Collections Manageme nt Database OAI metadata for harvesters and aggregators EAD XML for RMOA or other uses MARC records for III or other uses Metadata for local Systems: e. g. Heritage West, Penrose web. DUVAGA
LEVERAGE USE OF DIGITAL REPOSITORIES We don’t have to be self-sufficient Outsource low-level functions Mass storage Backup
CREATE PARTNERSHIPS Computer scientists Librarians Academic technologists
GET INTO MAINSTREAM DISCOVERY TOOLS GET “INTO THE FLOW” Can everyone say Google My. Space You. Tube Facebook
CREATE ACCESS TOOLS BASED ON USER NEEDS Understand how all of our constituencies seek information and use information Make our tools reflect these behaviors. When those behaviors change, our tools should change with them.
NEW SKILLS FOR THE DIGITAL ERA Jennifer King George Washington University
RE: DISCOVERY MAIN PAGE
RE: DISCOVERY FOR INTERNET SEARCH
RFI AND FINDING AID
From Document To Database Sylvia Augusteijn George Washington University Special Collections and University Archives SAA session 801 September 1, 2007
Out from the binders § Scope and content notes, series descriptions simple to cut and paste into Re: Discovery § Cut and paste not feasible for thousands of item-level records § “Container list” project is born § Goal: to separate elements of each item name (number, title, date) so Re: Discovery could import them into their respective fields
Container lists § Each item has a number, title, and date, but formats vary slightly in punctuation or spacing Ways of writing the same name: 1. Correspondence, 1950 -57 I. Correspondence – 1950 -1957 i. correspondence 1950 to 1957 § Naming conventions generally consistent within each finding aid § How to automate?
Automation, part 1: Delimiting the text § Container lists saved in a text editor (Text. Pad) § Delimiters are special characters placed within the text to separate the elements § We chose * to signal the beginning and end of each field and % to signal the boundary between fields § Item as it appears in text of finding aid: 1. Correspondence, 1950 -57 § Item with delimiters inserted: *1*%*Correspondence*%*1950 -57*
Delimiting the text (continued) § Re: Discovery can import directly from the text editor, with instructions § Instructions to Re: Discovery: the first element of this name will be the number, the second will be the title, the third will be the date *1*%*Correspondence*%*1950 -57* § How to add these delimiters to thousands of item records?
Automation, part 2: Regular expressions § A regular expression is a string that uses special characters (such as + $ ^ ]) to describe and match patterns of text within a document
Regular expressions (continued) § First used regular expressions to search through our text for anything formatted like an item (i. e. to search for a pattern in which an item number is followed by a title and date) § Then used regular expressions to insert our delimiters in between those elements To turn a page of this: Into a page of this: 1. Journals, 1950 -60 2. Photographs, 1970 -80 3. Postcards, 1940 -50 *00001*%*Journals*%*1950 -60* *00002*%*Photographs*%*1970 -80* *00003*%*Postcards*%*1940 -50*
Examples of regular expressions To turn 1. Correspondence, 1950 -1957 into *00001*%*Correspondence, 1950 -1957 Find: ([0 -9]). (find any digit followed by a period) Replace: *00001*%* (replace with *, four zeroes, that digit and *%*) Then to turn *00001*%*Correspondence, 1950 -1957 into *00001*%*Correspondence*%*1950 -1957 Find: , ([0 -9]{4}) (find any four-digit number preceded by a comma and space) Replace: *%*1 (replace the comma and space with *%*)
Challenges § Tweaking expressions slightly for each new container list § Writing the wrong expression and accidentally replacing the wrong text § Failing to export correctly to Re: Discovery due to small number of missing delimiters
Re: Discovery and beyond § Delimited text exported into Re: Discovery § From Re: Discovery, easy creation of EAD finding aids using a template § To date: 257 collections in Re: Discovery (and EAD finding aids on the web) 0 binders
CONTACT INFORMATION: Greg Colati Digital Initiatives Coordinator University of Denver greg. colati@du. edu Jennifer King Manuscripts Librarian George Washington University Washington, DC Jenking@gw. edu Sylvia Augusteijn Project Archivist George Washington University augusteijn@gelman. gwu. edu
- Slides: 36