CS 430 Information Discovery Lecture 6 Descriptive Metadata

  • Slides: 26
Download presentation
CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs 1

CS 430: Information Discovery Lecture 6 Descriptive Metadata 2 Library Catalogs 1

Course Administration • New Teaching Assistant Xiaozheng Zhong, xz 30@cornell. edu • Send email

Course Administration • New Teaching Assistant Xiaozheng Zhong, xz 30@cornell. edu • Send email about the course to the course team address: cs 430@cs. cornell. edu This address will send mail to: wya@cs. cornell. edu kyotov@cs. cornell. edu yukiko@cs. cornell. edu xz 30@cornell. edu 2

Course Administration Notes on Assignment 1 • Assignment has 5 search services, with three

Course Administration Notes on Assignment 1 • Assignment has 5 search services, with three questions about each. Please answer each question separately for each service. • The amount of information that you can discover about each question varies greatly. You may find that you write several paragraphs to answer one question for one service, or have only brief information. • If you have been unable to find much information about a question, list where you searched. 3

Course Administration Notes on Assignment 1 (continued) • To cite your references, a simple

Course Administration Notes on Assignment 1 (continued) • To cite your references, a simple URL with the title of a web page is adequate • To present your results, Microsoft Word is convenient. It is on all the laptops. PDF is an alternative. • 4 PLEASE PUT YOUR NETID ON YOUR REPORT.

5

5

6

6

Dublin Core: Personal Opinion Dublin Core is a simple way to describe digital content

Dublin Core: Personal Opinion Dublin Core is a simple way to describe digital content that: • is a single, self-contained object ("document-like") • is static with time • has few relationships Some web sites satisfy these criteria Dublin Core is not suitable for digital content that: • is heavily structured • changes dynamically 7

Library Cataloguing Anglo American Cataloguing Rules (AACR 2) • rules for what goes into

Library Cataloguing Anglo American Cataloguing Rules (AACR 2) • rules for what goes into each field of a catalog record MARC format • an exchange format for catalog records "MARC Catalog" • catalog in MARC format, where content of each field follows AACR 2 8

Example: Monograph catalog record Citation Caroline R. Arms, editor, Campus strategies for libraries and

Example: Monograph catalog record Citation Caroline R. Arms, editor, Campus strategies for libraries and electronic information. Bedford, MA: Digital Press, 1990. 9

MARC fields tag value 001 050 082 245 89 -16879 r 93 Z 675.

MARC fields tag value 001 050 082 245 89 -16879 r 93 Z 675. U 5 C 16 1990 027. 7/0973 20 Campus strategies for libraries and electronic title statement information/Caroline Arms, editor. {Bedford, Mass. } : Digital Press, c 1990. publisher xi, 404 p. : ill. ; 24 cm. collation EDUCOM strategies series on information technology series title Includes bibliographical references (p. {373}-381). ISBN 1 -55558 -036 -X : $34. 95 260 300 440 504 020 10

MARC fields (continued) 650 Academic libraries--United States--Automation. subject heading 650 Libraries and electronic publishing--United

MARC fields (continued) 650 Academic libraries--United States--Automation. subject heading 650 Libraries and electronic publishing--United States. 650 Library information networks--United States. 650 Information technology--United States. 700 Arms, Caroline R. (Caroline Ruth) 040 DLC DLC 043 n-us--955 CIP ver. br 02 to SL 02 -26 -90 11 985 APIF/MIG

MARC Encoding tag: 260 subfield a: {Bedford, Mass. } : subfield b: Digital Press,

MARC Encoding tag: 260 subfield a: {Bedford, Mass. } : subfield b: Digital Press, subfield c: c 1990. MARC encoding: &2600#abc#{Bedford, Mass. } : #Digital Press, #c 1990. % 12

Name authority files • Caroline R. Arms or Caroline Ruth Arms? • Which William

Name authority files • Caroline R. Arms or Caroline Ruth Arms? • Which William Phillips of Cardiff? • Mark Twain or Samuel Clemens? • Epithets: of Cardiff doctor • Dates: 1832 - 1876 flourished 1860 circa 1832 - 1876 13

Shared cataloguing OCLC -- Large centralized transaction processing database system When a library catalogs

Shared cataloguing OCLC -- Large centralized transaction processing database system When a library catalogs a book it deposits MARC record in OCLC Other libraries can copy the record • saves duplication of cataloguing • build database of holdings OCLC database has 43 million records 14

Subject information Library of Congress Subject Headings Academic libraries--United States--Automation Hierarchical classification Library of

Subject information Library of Congress Subject Headings Academic libraries--United States--Automation Hierarchical classification Library of Congress call number: Dewey Decimal Classification: Z 675. U 5 C 16 027. 7 Creation and maintenance of lists of subject headings and classifications is a never ending task. 15

Online public access catalog (OPAC) First stage • Library mounts its MARC records on

Online public access catalog (OPAC) First stage • Library mounts its MARC records on a central computer • Provides a simple terminal interface and dedicated terminals • Boolean search -- fielded searching [Most university libraries reached this stage about 1990] Second stage • Library connects computer to a campus network and Internet • Converts card catalog records to MARC (retrospective conversion) 16

Library information systems When the catalog is online. . . Add other collections and

Library information systems When the catalog is online. . . Add other collections and services: • • Secondary information (Inspec, Medline, Chemical Abstracts) Reference works (dictionaries, encyclopedias) Improve user interface • Add full text searching • Add web interface Add connections to off-campus information sources: 17 • • Scientific journals Databases (census, genome)

Library management systems A library management system, sometimes called an integrated library system, integrates

Library management systems A library management system, sometimes called an integrated library system, integrates the internal processes of a library, e. g. , acquisitions, cataloguing, binding, circulation, etc. It usually contains an online public access catalog, but does not provide integrated services to users. Library management systems are produced by small companies who lack the capital and technical expertise to develop modern digital libraries. 18

Notes on MARC A great achievement: 19 • Developed in 1960 s • Magnetic

Notes on MARC A great achievement: 19 • Developed in 1960 s • Magnetic tape exchange format for printing catalog records • The dawn of computing: mixed upper and lower case variable length fields, repeated fields non-Roman scripts • 100(? ) million records with standard content and format • Thousands of trained librarians (millions? )

Notes on MARC A great problem: • Not designed for computer algorithms • One

Notes on MARC A great problem: • Not designed for computer algorithms • One record per item (poor links between records) • Tied to traditional materials and traditional practices • Not Unicode • 100 of million records at $100 -- $10 billion A classic legacy system! 20

Cataloguing Objectives Functions of catalogs: finding collocating (recall and precision) choosing acquiring navigating. .

Cataloguing Objectives Functions of catalogs: finding collocating (recall and precision) choosing acquiring navigating. . . among items in a bibliographic universe Compare use cases in software design. 21

Cataloguing Principles 22 • User convenience common usage • Representation • Sufficiency and necessity

Cataloguing Principles 22 • User convenience common usage • Representation • Sufficiency and necessity parsimony • Avoid using one device to serve multiple functions (e. g. , to disambiguate and order)

IFLA Model Work A work is the underlying abstraction, e. g. , • •

IFLA Model Work A work is the underlying abstraction, e. g. , • • • The Iliad The Computer Science departmental web site Beethoven's Fifth Symphony Unix operating system The 1996 U. S. census This is roughly equivalent to the concept of "literary work" used in copyright law. 23

IFLA Model Expression. A work is realized through an expression, e. g. , •

IFLA Model Expression. A work is realized through an expression, e. g. , • The Illiad has oral expressions and written expressions • A musical work has score and performance(s). • Software has source code and machine code Many works have only a single expression, e. g. a web page, or a book. 24

IFLA Model Manifestation. A expression is given form in one or more manifestations, e.

IFLA Model Manifestation. A expression is given form in one or more manifestations, e. g. , • The text of The Iliad has been manifest in numerous manuscripts and printed books. • A musical performance can be distributed on CD, or broadcast on television. • Software is manifest as files, which may be stored or transmitted in any digital medium. 25

IFLA Model Item. When many copies are made of a manifestation, each is a

IFLA Model Item. When many copies are made of a manifestation, each is a separate item, e. g. , • a specific copy of a book • computer file [Works, expressions, manifestations and items are explored in CS 502, Computing Methods of Digital Libraries. ] 26