metadata considerations for digital libraries Tefko Saracevic Rutgers
metadata considerations for digital libraries © Tefko Saracevic, Rutgers University 1
the Web • fastest growing technology in history • explosive growth of WWW provided – ubiquity of information and access – but also information chaos & anarchy • growing difficulty in identifying, searching & retrieving • ‘lost in an ocean’ metaphors © Tefko Saracevic, Rutgers University 2
problem • to organize & search the Web needed: knowledge about the structure of data – but Web data & databases fuzzy – structures vary widely; no consistency – constantly evolve over time – lack of agreement about meaning of even simple terms & concepts in structure © Tefko Saracevic, Rutgers University 3
solution • some standardized description or language to increase functionality – a mechanism for a more precise description of things on the Web • going from machine-readable to machine-understandable – missing in original Web architecture METADATA ! © Tefko Saracevic, Rutgers University 4
metadata © Tefko Saracevic, Rutgers University 5
what? • metadata: ‘data about data’ – machine understandable information for the Web - emphasis on machine – description of what a text (or any object) part is all about • e. g. labeling title, author, source … • many evolving standards suggested to be applied in various domains © Tefko Saracevic, Rutgers University 6
where? • in volatile digital environments – metadata describe electronic resources, texts & multimedia – metadata exist or have meaning only in relation to the referenced document or object • provide information about the object © Tefko Saracevic, Rutgers University 7
why? • to standardize description of what is what in electronic resources in order • to aid in identification, organization, & location of a great variety • to enable effective search of variety of objects (documents) distributed all over • sometimes also to provide controls (e. g. validation, rights, provenance, ratings. . . ) © Tefko Saracevic, Rutgers University 8
importance • standard metadata descriptions are a prerequisite to – common use – effective searching – ‘intelligent’ roaming by agents – validation, ratings, © Tefko Saracevic, Rutgers University 9
markup languages • SGML - granddaddy (standard in 1986) – marks elements within documents • derived from old markups for typesetting • adapted by communities producing electronic documents • machine independent - reason for success – transportable from one hardware & software to another; substitutes strings • many extensions & specific applications © Tefko Saracevic, Rutgers University 10
principles • ALL markup language must specify • what markup means • what markup is allowed • what markup is required • how markup is distinguished from text • all markup languages & applications follow these principles • underlying concepts are fairly simple but they get very confusing real fast. © Tefko Saracevic, Rutgers University 11
specifications • types of documents defined by DTD Document Type Definitions – many types & applications formulated • vary greatly in complexity and use • RDF - Resource Description Framework – a common syntax, data model & scheme for describing © Tefko Saracevic, Rutgers University 12
extensions • HTML - most famous & successful – allows for metatags in the Head • not used much, even discouraged • in the body could be indirect • XML - the next big thing (hopefully) • data format for structured document interchange & interoperability on WWW • increases functionality of SGML & combines with ease of use of HTML © Tefko Saracevic, Rutgers University 13
who specifies standards? • formal groups – national & international standards organizations - ISO, ANSI, NISO • informal groups – WWW Consortium (W 3 C) – Dublin Core – Library of Congress © Tefko Saracevic, Rutgers University 14
proliferation • currently: proliferation of metadata standards activities -many domains – a lot of confusion & incompatibility – in document description & libraries • coordination through liaisons & a number of projects in the U. S & internatioanly – strength: domain experts involvement – weakness: limited perspective; re-invention © Tefko Saracevic, Rutgers University 15
libraries • in libraries metadata has a very long tradition long preceding the Web (but not called metadata) – cataloging rules, standards • MARC (Machine Readable Cataloging) • enabled worldwide exchange of cataloging records • but long standing problems with searching © Tefko Saracevic, Rutgers University 16
sample of projects • Encoded Archival Description (EAD) • Text Encoding Initiative (TEI) • Federal Geographic Data Committee (FGDC) - geospacial data • Z 39. 50 standards - searching • crosswalks: mapping e. g. DC to MARC © Tefko Saracevic, Rutgers University 17
Dublin Core (DC) • international initiative to describe a core set of Web resources – a set of 15 elements · Title; Creator; Subject; Description; Publisher; Contributor; Date; Type; Format; Identifier; Source; Language; Relation; Coverage; Rights • wide interest & a lot of work · but not widely applied on the Web © Tefko Saracevic, Rutgers University 18
library interoperability • library catalogs bound by proprietary software & hardware • middleware needed – protocols (based on Z 39. 50) provide for interaction of clients with many servers (catalogs) • problems remain with semantic interoperability © Tefko Saracevic, Rutgers University 19
digitization • metadata assignment (cataloging) a key component in digitization or electronic publishing • choices: a spectrum of possibilities to select & apply metadata • search for automation - e. g. templates • connection with cataloging, indexing © Tefko Saracevic, Rutgers University 20
decisions, decision – how & what to plan for metadata creation in conjunction with dl? – target audience? – scope and depth? – what to adopt? plug-in in a scheme? – how to integrate metadata projects? – needed skills? training? staffing? © Tefko Saracevic, Rutgers University 21
$$$$ • costs of metadata: HUGE – involved operations – time, personnel, effort – learning many new things included – making decisions complex & involved • cooperative activities essential • libraries pushed out of libraries © Tefko Saracevic, Rutgers University 22
© Tefko Saracevic, Rutgers University 23
- Slides: 23