OCLC and FRBR directions and OCLC Online Computer

  • Slides: 35
Download presentation
OCLC and FRBR: directions and OCLC Online Computer Library Center research results Lorcan Dempsey

OCLC and FRBR: directions and OCLC Online Computer Library Center research results Lorcan Dempsey with contributions from Diane Vizine-Goetz, Ed O’Neill, Thom Hickey and Eric Childress ! Revolution or Evolution? The impact of FRBR (Functional Requirements for Bibliographic Records) Organized by the Australian Committee on Cataloging. Melbourne Convention Centre, 2 February 2004 Click to edit Master title style

Overview n FRBR and OCLC n OCLC research work n OCLC production plans n

Overview n FRBR and OCLC n OCLC research work n OCLC production plans n Some issues

FRBR and OCLC n Long standing interest in work-based approaches – The Humphry Clinker

FRBR and OCLC n Long standing interest in work-based approaches – The Humphry Clinker problem n Strong practical interest – End-user presentation – ILL – Cataloging – help find records – Collection analysis – Data enrichment

OCLC Research and FRBR n Mining the data … – Ed O’Neill n Algorithmically

OCLC Research and FRBR n Mining the data … – Ed O’Neill n Algorithmically FRBRizing – Thom Hickey n Work-based prototypes – Diane Vizine-Goetz – Thom Hickey

OCLC Online Computer Library Center Mining the data Analyzing representations of a single work

OCLC Online Computer Library Center Mining the data Analyzing representations of a single work in detail. Tested OCLC Research conversion algorithm against 1000 works. Click to edit Master title style

Types of Works n n n Elemental Works have only a single manifestation (78

Types of Works n n n Elemental Works have only a single manifestation (78 %) Simple Works have only a single expression but multiple manifestations (16 %) Complex Works have multiple expression (6 %)

Principal Types of Complex Works n Translations n Augmented n Revised n Collected/Selected

Principal Types of Complex Works n Translations n Augmented n Revised n Collected/Selected

Translations n n All translations are expressions Other types of complex works frequently include

Translations n n All translations are expressions Other types of complex works frequently include translations

Typical Augmented Work The Expedition of Humphry Clinker n 48 Expressions n 114 Manifestations

Typical Augmented Work The Expedition of Humphry Clinker n 48 Expressions n 114 Manifestations n Expressions created by augmentation with: notes, introductions, illustrations, bibliographies, glossaries, etc.

Typical Revised Work n 1 st and 2 nd Editions are by John Phillip

Typical Revised Work n 1 st and 2 nd Editions are by John Phillip Immroth • 3 rd and 4 th editions are by Lois Mai Chan and “Immroth’s” was added to the title

Collected Works n n A collection of items each of which is a distinct

Collected Works n n A collection of items each of which is a distinct intellectual or artistic creation; a collection of works 50% of ‘collected works’ explicitly list component works.

And … n Expressions not clear. n Bring out the differences that matter. n

And … n Expressions not clear. n Bring out the differences that matter. n n Retrospective activity constrained by available bibliographic data. Empirical work will support ongoing clarification of the model (Working group on the expression entity)

OCLC Online Computer Library Center Algorithmically ‘FRBRizing’ The OCLC Research work set algorithm Click

OCLC Online Computer Library Center Algorithmically ‘FRBRizing’ The OCLC Research work set algorithm Click to edit Master title style

Our Approach n Concentrating on work-level – Problems with expression-level clusters n Efficient, maintainable,

Our Approach n Concentrating on work-level – Problems with expression-level clusters n Efficient, maintainable, understandable n Useful matches with correct cataloging – Err on the side of missed matches – Some accommodation of frequent variants (e. g. Shakespeare’s Hamlet = Hamlet) n Compare with manually clustered – Reliable at work level. Expression level not clear enough.

The Algorithm n A key is generated for each record n Extract author, title

The Algorithm n A key is generated for each record n Extract author, title – Look up in NACO authority file – Added entry information as needed n Form a key from bibliographic record – Author, title, added entry information – These can be sorted, compared

Results n Manual estimate: 1. 5 manifestations/work in World. Cat n Algorithm: ~1. 27

Results n Manual estimate: 1. 5 manifestations/work in World. Cat n Algorithm: ~1. 27 n 25, 000 clusters have >20 records n 415, 000 clusters have >4 records n 30% records and 50% of holdings are in a cluster

OCLC Online Computer Library Center Work-based prototypes Fiction. Finder XISBN Click to edit Master

OCLC Online Computer Library Center Work-based prototypes Fiction. Finder XISBN Click to edit Master title style

Fiction. Finder n n A prototype system of 2. 6+ million bibliographic records for

Fiction. Finder n n A prototype system of 2. 6+ million bibliographic records for fiction clustered according to the OCLC FRBR work set algorithm Uses the FRBR model to organize, index, and display bibliographic elements of potential interest to users

Fiction Subset n 2, 665, 662 World. Cat records (fiction indicator) n 1, 758,

Fiction Subset n 2, 665, 662 World. Cat records (fiction indicator) n 1, 758, 479 work clusters n 1. 5 records/cluster n n 3, 866 clusters have 20 or more records 50, 540 clusters have 5 or more records

Most widely held fiction works Holdings M’stations Key 29, 043 692 twain, mark1835 1910/adventures

Most widely held fiction works Holdings M’stations Key 29, 043 692 twain, mark1835 1910/adventures of huckleberry finn 26, 088 1, 267 carroll, lewis1832 1898/alices adventures in wonderland 20, 843 640 twain, mark1835 1910/adventures of tom sawyer 19, 410 1, 341 defoe, daniel1661 1731/robinson crusoe 18, 566 983 cervantes saavedra, miguel de1547 1616/don quixote 18, 492 836 stevenson, robert louis1850 1894/treasure island 18, 123 526 dickens, charles1812 1870/christmas carol 18, 100 278 crane, stephen1871 1900/red badge of courage 17, 761 525 bronte, charlotte1816 1855/ Jane Eyre 17, 499 332 chekhov, anton pavlovich1860 1904/short stories

Fiction. Finder & FRBR n n Information that applies to all expressions of a

Fiction. Finder & FRBR n n Information that applies to all expressions of a given work, such as summaries, genre terms, and subjects given precedence in work/expression-level screen displays. Because of the difficulty of consistently identifying expressions, manifestations are organized by language of expression

Work display

Work display

Work/expression display

Work/expression display

Fiction. Finder & FRBR n Some characteristics of an expression, such as expression title,

Fiction. Finder & FRBR n Some characteristics of an expression, such as expression title, e. g. , – Harry Potter and the Philosopher's Stone v. s – Harry Potter and the Sorcerer’s Stone are presented at the Work/Expression level n Other less clear-cut distinctions between expressions & manifestations, such as Braille and electronic book versions are presented at both the Work/Expression level and the Manifestation level.

Work/expression/manifestation display

Work/expression/manifestation display

x. ISBN n An experimental web service: – x. ISBN server receives a single

x. ISBN n An experimental web service: – x. ISBN server receives a single ISBN and returns a list of all ISBNs for the work cluster – Designed for machine-to-machine data exchange – Can return list in XML or XHTML n Supports automatic expansion of ISBN searches: – Check user ILL requests against all editions/versions in OPAC – Use x. ISBN bookmarklet to find local library’s editions when user finds any edition of item on Amazon, etc. – Quickly check OPAC for all editions/versions during selection/acquisitions/gift book processing

x. ISBN OCLC FRBR Work-Set Algorithm Eucalyptus / Murray Bail 1998 Melbourne : Text

x. ISBN OCLC FRBR Work-Set Algorithm Eucalyptus / Murray Bail 1998 Melbourne : Text Pub. ISBN: 1875847634 http: //labs. oclc. org/xisbn/1875847634 x. ISBN table builder x. ISBN server work cluster 1 ISBN 2 ISBN 3 work cluster 3 ISBN 8 ISBN 9 ISBN 10 work cluster 2 ISBN 5 ISBN 6 ISBN 7 <? xml version="1. 0" encoding="UTF-8" ? > - <idlist> <isbn>1875847634 </isbn> <isbn>1860464947</isbn> <isbn>1860464955</isbn> <isbn>963859313 x</isbn> <isbn>2221087615</isbn> <isbn>9532060065</isbn> <isbn>9657120055</isbn> </idlist> Eucalyptus 1998 Melbourne : Text Pub. Eucalyptus 1998 London : Harvill Press Eucalyptus 1999 London : Panther Eukaliptusz 1999 Budapest : Ulpius-ház [Hungarian] Eucalyptus 1999 Paris : R. Laffont [French] Eukaliptus 1999 Zagreb : Meandar [Croatian] Ekaliptus 2001 Tel Aviv : Hargol [Hebrew]

Searching for the book on Amazon

Searching for the book on Amazon

Library. Lookup bookmarklet Library. Lookup http: //www. amazon. co. uk/exec/obidos/A SIN/1860464955/qid=1075134526/sr=11/ref=sr_1_10_1/202 -6426661 -8213436 Single

Library. Lookup bookmarklet Library. Lookup http: //www. amazon. co. uk/exec/obidos/A SIN/1860464955/qid=1075134526/sr=11/ref=sr_1_10_1/202 -6426661 -8213436 Single ISBN Is the book at my library?

x. ISBN bookmarklet Library. Lookup http: //www. amazon. co. uk/exec/obidos/A SIN/1860464955/qid=1075134526/sr=11/ref=sr_1_10_1/202 -6426661 -8213436 x.

x. ISBN bookmarklet Library. Lookup http: //www. amazon. co. uk/exec/obidos/A SIN/1860464955/qid=1075134526/sr=11/ref=sr_1_10_1/202 -6426661 -8213436 x. ISBN server Multiple ISBNs ADDED ADDED x. ISBN Is the book at my library?

OCLC production plans n FRBR in First. Search (end-user searching) – End 2004 as

OCLC production plans n FRBR in First. Search (end-user searching) – End 2004 as part of broader searching enhancement. – Present users with view most relevent to them (work, manifestation, …) n FRBR and cataloging – Interested in potential for ‘FRBRization’ services – Use FRBR as aid to finding cataloging copy – FRBR view of cataloging yet to be discussed.

Some issues n n Data. Variations in cataloging practice and errors or omissions in

Some issues n n Data. Variations in cataloging practice and errors or omissions in transcription and input lead to false clusters Systems. Support in library management and other systems. Agreement and shared practice. Theoretical discussion needs to be informed by practice. The detail! Communications format. How to share works etc. Different internal

Further information www. oclc. org/research Projects Publications Research. Works (soon) Software (algorithm)

Further information www. oclc. org/research Projects Publications Research. Works (soon) Software (algorithm)