OCLC Online Computer Library Center Data Mining Library
OCLC Online Computer Library Center Data Mining Library Collection Silos: Print Books and E-books in Library Collections Lynn Silipigni Connaway Ed O’Neill Chandra Prabha Brian Lavoie
Collection Assessment Why assess collections? – Provide data for member libraries for decision-making • Description of the collection – Identify specific subject areas » Determine collection age » Rate of growth » Strengths and weakness • Overlap/gap analysis • Identify last copy • Useful information – – – Outside funding Library collection comparisons Remote storage decisions Collection development and management Identify role of non- ARL libraries
World. Cat as a Collection World’s largest bibliographic database – July 1, 2003 = 50 million+ records – 1 billion holdings Ideal source for data-mining Characteristics of World. Cat – Age – Subject, using NATC – Holdings by type of library • • • ARL Academic, non-ARL Public School Special
World. Cat as a Collection Use of MARC data elements in World. Cat – Types of materials – Library holdings to determine audience levels Collection assessment and collection use – Unique titles – Analyze and compare aggregate holdings for libraries – Identify print books (p-books) and electronic books (e -books)
World. Cat Holdings by Library Types
World. Cat Number of Holdings
World. Cat Number of Records
World. Cat Holdings
World. Cat Holdings
Study Objective Digital materials constitute increasing proportion of library collections Effective strategies for integrating print and digital materials within a library collection – Eliminate redundancies – Meet user expectations Data-mining increasingly important to support collection management decisions – World. Cat • World’s largest bibliographic database • Ideal as source for data-mining Data-mine World. Cat in order to examine characteristics of p-books and e-books
Rationale Collection management – – Development Cooperation Deselection Preservation Space allocation and management Meet user expectations Services for off-site users Migration from print to digital Convenient access – 24/7 access – Desk-top delivery
Scope World. Cat – July 1, 2003 = 50 million+ records – 1 billion holdings Digital Items Books – Print (p-book) – Digital (e-book)
Strategy Identify digital items with at least one other manifestation in World. Cat – FRBRize database • Work – Distinct intellectual or artistic expression – Cluster works in World. Cat • Manifestation – Physical embodiment of a work Identify digital items with p-book equivalents – Assumption • If digital items have p-book equivalents, then digital items are e-books – Identify publishers and publication dates
Need to Determine Comparison of p-books and e-books – – – What is a book? What is a p-book? What is an e-book? What is a digital item? How do we extend p-book criteria to digital world?
What is a Digital Item? Working definition of digital item – Computer file – OR Electronic resource – OR Appropriate 856 field • Indicates electronic location or access
What is a P-book? No consensus for definition of a book – Text (type = a) and monograph (bib level = m) • • • Broadsides? Pamphlets? Government documents? Children’s books? Microforms? – Authoritative Definitions • UNESCO – Nonperiodical literary publication consisting of > 49 pages, covers excluded • ANSI – Publications consisting of > 49 pages – Hard covers • US Postal Service (publication) – Publications > 24 pages
A P-book IS: Based on UNESCO definition Working definition of a p-book – – – – Printed on paper (excludes microform) Language material Monograph Physical description Form of item = regular or large print Title does not include a GMD Substantial length (> 49 pages; > 25 to include juvenile titles) – Excludes manuscripts (dissertations and theses)
What is an E-book? Difficult to define e-book – Digital version of p-book (straightforward) – New conceptual views of a book in digital environment Assumption – P-book is well-defined – If digital item has manifestation as a p- book, then digital item must also be a book – If p-book has digital equivalent or vice-versa, ignore ebook that has no print equivalents
An E-book IS: E-Book = Electronic (Digital) + Book Definition of e-Book: – Digital equivalents of p-books – New conceptual definitions of books in digital environment
World. Cat Record Analysis P-book records = 24, 048, 235 (48% of WC) Digital item records = 795, 630 (15% of WC) – Web sites • Collections of interlinked, Web-accessible materials residing at a single location on the Internet – Documents • Various forms of electronic documents • E-books with no p-book equivalents and no minimum page requirements – – Book chapters Broadsides Brochures Pamphlets – Reprints • E-books with p-book equivalents = 76, 375 (1. 5% of WC)
World. Cat Record Analysis Digital item records (continued) – Interactive learning objects • Computer programs offering self-contained, interactive tutorial or educational experience – Software • Computer programs for creating and manipulating information – Serials • Journals • Proceedings – Images – Theses – Other (2 records) • Computer game • Raw data file
Digital Items in World. Cat
Publication Dates of Digital Items With P-Book Equivalents in World. Cat
Publishers of Digital Items With P-Book Equivalents in World. Cat Approximately 15, 000 unique publishers Approximately 150 publishers with > 25 records Top 10 publishers – – – – – Institute of Electrical and Electronic Engineers (IEEE) National Bureau of Economic Research US Government Printing Office Springer Inter-University Consortium for Political and Social Research Power. Kids Press University of Virginia Library MIT Press Microsoft Broderbund Software and Books
Discussion of Analysis Small number of – E-books with p-book equivalents – Publishers with > 25 records for e-books with p-book equivalents Recent publication dates for e-books with p-book equivalents More Web sites than documents or reprints Difficult to identify and categorize digital items – Inconsistent cataloging policies and practices for digital items – Inconsistent definitions for types of digital items
Future Research Establish accepted criteria for defining an e-book independent of p-books Identify and compare type of library holdings and NATC subjects for p-books and e-books – Identify electronic collection silos Continue to collect these data to compare for trends Identify types of content/materials that are better suited for either print or digital environment
OCLC Online Computer Library Center Questions and Discussion connawal@oclc. org oneill@oclc. org
- Slides: 27