Automating Name Authority Record Updates and Bibliographic File
Automating Name Authority Record Updates and Bibliographic File Maintenance A Proof of Concept Lucas Mak Michigan State University Libraries Catalog Management Interest Group, ALA Annual, Chicago, IL, June 29, 2013
Authority Control at MSU § 1. 5 millions Authority Records (1. 1 millions NARs) § In-house § NACO institution § Database maintenance § Post-cataloging Authority Control § New Headings Report • Download NARs from Sky. River § Updates to NARs not necessary caught • 1 XX (No item cataloged under changed 1 XX not in New Headings Report) • Elements other than 1 XX (e. g. 4 XX, 670)
LC/NACO NAF RDA Transition § PCC Day 1 for RDA NAR: Mar. 31, 2013 § Phased reissuance of NARs § Phase 1 • Scope – NARs with characteristics known to be at variance with RDA practice – Not candidates for any of the mechanical changes to be made during phase 2 • Adding a 667 note “THIS 1 XX FIELD CANNOT BE USED UNDER RDA UNTIL THIS RECORD HAS BEEN REVIEWED AND/OR UPDATED” – Completed Aug. 20, 2012 (436, 943 records processed) § Phase 2 • Programmatic changes to 1 XX headings that are not acceptable under RDA (e. g. , changes to Bible headings, spelling out Dept. and months, etc. , abbreviations in the subfield $d for personal names) • Completed March 27, 2013 (371, 942 records changed)
§ Updates of NARs by NACO institutions § Reviewing, upgrading, and recoding Phase 1 records to RDA § Adding any of the 17 new MARC fields (e. g. 046, 372, etc. ) § Routine NAR maintenance • PCC post-RDA test guidelines “strongly encourage” to evaluate and recode the “RDA-acceptable AACR 2 NARs” to RDA whenever possible
Objectives § To catch changes to NARs § Changes in 1 XX § Addition, deletion, or updates of elements other than 1 XX § To perform related BFM if 1 XX in a NAR is changed
Tasks To download NARs one-by-one/in bulk To detect updates to NARs already existing in ILS To overlay existing NARs with updated ones Updates authorized access points (AAPs) in bib records if 1 XX in NAR updated § To automate and link up the above tasks § §
Task #1: Download NARs § OCLC LCNAF SRU Service § Can be searched by LCCN § Available in multiple schema including MARCXML § SRU-based service (HTTP request) § FREE!! § But: • Updated every Monday night • Bulk download – by search term (e. g. after certain date) § Implementation • Search LCCNs one-by-one by Auto. It script – Around 10 records/sec. retrieved • Download XML files into one folder (files named by LCCN)
Task #2: NAR Update Detection § To compare NARs from ILS and NARs from LC/NACO NAF by XSLT § MARC 005 (timestamp) § If timestamp more current on the NAR from NAF Overlay the NAR in ILS
Task #3: Export/Overlay of NARs § Marc. Edit § Export updated NARs into ILS § Through TCP/IP (Host address, Port, . mrc file) § One-by-one (though. mrc file can contain multiple NARs)
Task #4: Updates of Bib AAPs § XSLT § To detect changes in 1 XX between old and new NARs § To build AAP conversion table (a TXT file) when 1 XX is changed § Auto. It § Automate bib AAP updates by “Global Update” module in ILS • Read old and new AAPs from the TXT file and fill out info required in “Global Update” process
Task #5: Automation § Use Auto. It to: § Link up various steps in the workflow § Automate searching against OCLC LCNAF SRU Service by compiling and sending HTTP requests § Execute various XSLTs in a predetermined sequence • e. g. NAR comparison AAP comparison § Read TXT files (LCCN list, AAP conversion table) created by XSLT processes § Run Marc. Edit to overlay obsolete NARs § Execute “Global Update” process
Basic Workflow LCCNs Search by Auto. It Extract by XSLT Retrieve Compare by XSLT LC/NACO NARs ILS NARs Extract by Create Lists Updated Headings Updated NARs Overlay by Marc. Edit Global Update ILS
Data Integrity Issue #1 § No ILS ARN in extracted NARs § Needed for 949 overlay command § Solution • Extract “LCCN” & “ILS ARN” pair through Create Lists • Merge ARN into extracted NARs (907$a) by XSLT/Marc. Edit
Data Integrity Issue #2 § NARs without 010 § 010 contains LCCN § Some LCCNs transposed into 035 • Original prefix (n, no, nb, nr) removed • Prepended with prefix (OCo. LC) • Possibly done during system migration § Solution 1. Search string in 035 (excl. prefix) as keyword in Sky. River 2. Retrieve complete LCCN from matched record 3. Search retrieved LCCN against OCLC Service and download the record
Data Integrity Issue #3 § Existing NARs without 005 § No timestamp • Bring in the new NAR whenever the old NAR lacks 005
Data Integrity Issue #4 § Local data in NAR § Local call no. (e. g. 050, 090, 053$5) § Institution code & initials (shared catalog) § Copy local data into new NAR before overlay
Search and Retrieval Issue #1 § “Blank” XML File from OCLC LCNAF SRU Service
Search and Retrieval Issue #1 (Cont’d) § No hit for some LCCNs § XML file size: < 2 KB § LCCNs in places other than 010$a Not indexed • Cancelled LCCNs (010$z) § Solution Compile a list of LCCNs with file size < 2 KB 2. Search LCCNs in Sky. River by Keyword 3. Get new LCCNs from 010$a 4. Search OCLC LCNAF SRU Service using new LCCNs § But … 1.
Search and Retrieval Issue #2 § Keyword search in Sky. River returns multiple hits § Undifferentiated & related NARs Person broken out from undifferentiated NAR Original undifferentiated NAR cancelled § Write LCCNs with multiple hits to a log file for manual review
Search and Retrieval Issue #2 (Cont’d) • Keyword search in Sky. River returns multiple hits § Same numeral part of LCCN with different prefixes NAR contributed via RLIN NAR contributed via OCLC § Write LCCNs with multiple hits to a log file for manual review
Search and Retrieval Issue #2 (Cont’d) § Keyword search in Sky. River returns no hit § The LCCN in question no longer exists in NAF • NAR containing cancelled LCCN was cancelled again – Loss of 010$z • Write no-hit LCCNs into log file for manual review
Search and Retrieval Issue #2 (Cont’d) § Keyword search in Sky. River returns no hit § False negative • Space between prefix and number removed • Hyphen within number removed (e. g. n 85 -342238 n 85342238) – Search normalized LCCNs • Delay in returning result for a search due to slow or unstable Internet connection speed – Set a longer wait time before trying to copy new LCCN – Run keyword search in Sky. River in loop until » Number of entries in log file equals to immediate preceding round, or » File size of the no-hit log file equals zero
Global Update Issues § ILS interface navigation § AAPs with diacritics § Found by search in Global Update module but couldn’t be replaced § Code points & exact match in Global Update § Old AAPs not found § Corresponding bib records deleted “Orphan” NARs § Write LCCN to log file for manual review
Revised Workflow Retrieve New LCCN Search LCCNs ra ct nd/ Fou s Not ple Hit ti Mul Ex t Not Found & Search Foun d& Retr ieve Compare LC/NACO NARs M er ge ILS NARs ARN- LCCN Updated NARs Updated AAPs Global Update Overlay by Marc. Edit Extract ILS Fish y. N AAPs No t ARs Found Log File
Test Results § 82, 398 NARs tested § 81, 362 NARs needed to be overlaid* § 4, 584 AAPs became obsolete § 10, 900 bib records had at least one heading flipped * Many NARs exported from ILS do not contain field 005
Limitations § Identities broken out from undifferentiated NARs can’t be detected § Partially taken care of by “New Headings Report” § AAPs have no corresponding NARs § Non-Latin script parallel APs in Field 880 § Scalability issues § Slow export using Marc. Edit § Slow “Global Update” process § Memory intensive XSLT process • “Java heap space” out of memory error
Possible Enhancements § “Data Exchange” module for NAR overlay § Data Exchange module – record load function § Manual intervention needed § SQL backend of Sierra (Sierra DNA) § Write SQL commands to batch changes § But, EDIT function not yet available through SQL command § AACP (Automatic Authority Control Processing) § Flip AAPs matching 4 XX in NARs to corresponding 1 XX in an overnight process § Replace “Global Update” with AACP • “Rig” undated NARs by inserting obsolete AAP as 4 XX • Export “rigged” NARs to ILS to trigger the overnight process • Overlay exported “rigged” NARs in ILS with original updated NARs
Questions? § Lucas Mak (makw@msu. edu)
- Slides: 28