LIS 571 Readings Reaction Assignment Topic 3 Representation

  • Slides: 41
Download presentation
LIS 571 Readings Reaction Assignment Topic 3: Representation and Metadata by: Amanda Alessi also

LIS 571 Readings Reaction Assignment Topic 3: Representation and Metadata by: Amanda Alessi also by: David Salley and Andrew Kloc

Thought Question: Has the research and development of metadata schemes designed for specific users

Thought Question: Has the research and development of metadata schemes designed for specific users and/or collections moved the library and information profession forward in our task to provide the best access possible for our users?

Readings: Duval, E. et al. 2002 Metadata Principals and Practicalities, D-Lib Magazine, Schottlaender, B.

Readings: Duval, E. et al. 2002 Metadata Principals and Practicalities, D-Lib Magazine, Schottlaender, B. 2003. Why metadata? Why me? Why now? Cataloging and Classification Quarterly, v. 36 (3/4) pg. 19 -29 Taylor, A. 2004 Metadata in The Organization of Information. 2 nd ed. Westport CT. [Ch 6, pg 139158] Schatz, Bruce R. 1997 Information Retrieval in Digital Libraries: Bringing Search to the Net Science, v 275 #5298 pg. 327 -334

Article Metadata Principles and Practicalities by Erik Duval, Wayne Hodgins, Stuart Sutton, and Stuart

Article Metadata Principles and Practicalities by Erik Duval, Wayne Hodgins, Stuart Sutton, and Stuart L. Weibel

Introduction Rapid development of the World Wide Web created information chaos. A Metadata helps

Introduction Rapid development of the World Wide Web created information chaos. A Metadata helps organize information. It is broadly defined as data about data (Duval, etc. 2002). B

Principles c Concepts common in all metadata schema. Schema—attribute/value or element set. 1. Modularity

Principles c Concepts common in all metadata schema. Schema—attribute/value or element set. 1. Modularity —machine interoperability; Standards allow flexibility. Lego metaphor D

Namespaces work with modularity Formal collection of terms managed according to policy or algorithm

Namespaces work with modularity Formal collection of terms managed according to policy or algorithm (Duval, etc. 2002). Example: HTTP and LCSH E Any metadata element set is a namespace bounded by rules determined by its managers (Duval 2002). Namespaces allow metadata schema designers to keep a term uniquely defined. Example: Dublin Core metadata always starts with dc:

2. Extensibility Some metadata elements are popular in metadata schemas, while others aren’t. Example:

2. Extensibility Some metadata elements are popular in metadata schemas, while others aren’t. Example: Creator vs. temperature F A base schema is a goal with room for additional elements that each unique community can apply to itself.

3. Refinement Level of detail is different for any given purpose. G Makes an

3. Refinement Level of detail is different for any given purpose. G Makes an element more specific. Example: Composer vs. Creator. Involves expression of dates and times. Example: 04/07/03= April 7, 2003 or July 4, 2003. H

4. Multilingualism Metadata must accept working with different languages and cultures. J Example: <LI>

4. Multilingualism Metadata must accept working with different languages and cultures. J Example: <LI> K Translate metadata standards into multiple languages. Metadata can describe the original resource’s language and culture. --can mention other available versions of the resource and contact information for the translator. Other ways cultures communicate differently? L

Practicalities “Rules of thumb, constraints, and infrastructure issues that emerge from bringing theory into

Practicalities “Rules of thumb, constraints, and infrastructure issues that emerge from bringing theory into practice in the form of useful and sustainable systems” (Duval, etc. 2002) Principles above lead to these practicalities.

1. Application Profiles An assemblage of metadata elements selected from one or more metadata

1. Application Profiles An assemblage of metadata elements selected from one or more metadata schemas and combined in a compound schema (Duval, etc. 2002) Means of expressing principles of modularity and extensibility Make rules such as this: required data element=language, optional one=color. M

2. Syntax and Semantics—meaning; syntax—form HTML (hyper text markup language) is simple. This is

2. Syntax and Semantics—meaning; syntax—form HTML (hyper text markup language) is simple. This is good and bad. N XML is markup language of choice. O

3. Association Models --ways to associate metadata with resources Embedded metadata—created by the author

3. Association Models --ways to associate metadata with resources Embedded metadata—created by the author of the resource P Associated metadata—kept in separate files; change the metadata, not resource. Q Third party metadata—filed by an organization that may or may not have control over the resource

4. Naming Metadata Elements Each element set must have a globally addressable name or

4. Naming Metadata Elements Each element set must have a globally addressable name or URI (Uniform Resource Identifier). Makes machine processing of metadata possible despite different languages or cultures

5. Metadata Registries “Important topic of digital library research at this time” Contain or

5. Metadata Registries “Important topic of digital library research at this time” Contain or link to controlled vocabularies from which the values of metadata fields are selected (Duval, etc. 2002) “electronic dictionary” (Duval, etc. 2002) Who will use a registry? n n n Application designers to identify schemas Creators of metadata to get definitions for elements End users to better understand the context of metadata in hopes of improving their searches

6. Completeness of Description Not every available element should be used for every resource

6. Completeness of Description Not every available element should be used for every resource type. Example: No scent field for a map. Detailed Description n Improves searching precision Requires higher investment in creation of metadata (Duval, etc. 2002) Makes it more difficult to promote consistency (Duval, etc. 2002)

Simple Description n Easier and cheaper to make May result in more false hits

Simple Description n Easier and cheaper to make May result in more false hits or more effort to pick the most relevant results (Duval, etc. 2002) Improves chances of interoperability

7. Subjective and Objective Metadata can be completely unbiased: author, date of publication, edition

7. Subjective and Objective Metadata can be completely unbiased: author, date of publication, edition or version R Fields become subjective when they come to mean different things to different cultures. Semantics is compromised. Example: keywords, summaries, reviews

8. Automated Generation of Metadata Before the Web, there were librarians cataloging Cataloging metadata

8. Automated Generation of Metadata Before the Web, there were librarians cataloging Cataloging metadata “remains the most successful standard for resource discovery of books and periodicals” (Duval, etc. 2002) Costly Impractical for Internet materials/resources

Web Search Engines Index lots of the Internet Low cost, advertiser supported model Type

Web Search Engines Index lots of the Internet Low cost, advertiser supported model Type of metadata Advances in natural language processing, profile and pattern recognition, data mining Electronic paper like PDF allows authorsupplied attributes to simplify making metadata S

Conclusion Information useful if organized; organizer role metadata Those who create metadata will have

Conclusion Information useful if organized; organizer role metadata Those who create metadata will have different motives, goals, and techniques just like authors of books have different ways of writing Communities must agree on rules and common procedures in order to understand share information across cultures

Schottlaender, B. Why metadata? Why me? Why now? Reviewed by: David Salley

Schottlaender, B. Why metadata? Why me? Why now? Reviewed by: David Salley

Metadata Definitions “a cloud of collateral information around a data object” “structured, encoded data

Metadata Definitions “a cloud of collateral information around a data object” “structured, encoded data that describes characterization of information-bearing entities to aid in the identification, discovery, assessment and arrangement of the described entities”

Metadata Schema “A set of rules for encoding information that supports specific communities of

Metadata Schema “A set of rules for encoding information that supports specific communities of users” -- Association for Library Collation and Technical Service Committee on Cataloging Description and Access Task Force

Metadata Schema – 5 Years Ago Only 4 types Descriptive Administrative Technical Rights

Metadata Schema – 5 Years Ago Only 4 types Descriptive Administrative Technical Rights

Metadata Schema – Today Many more types, including: Security Personal Information Commercial management Content

Metadata Schema – Today Many more types, including: Security Personal Information Commercial management Content rating Preservation Metadata Etc.

Cataloging and Metadata “Metadata is about access, cataloging is about access” – Schottlaender “The

Cataloging and Metadata “Metadata is about access, cataloging is about access” – Schottlaender “The invisible process of order making” – Kevin Butterfield Four steps to each: find, select, identify, obtain.

Relevant Quotations: “I see an increasing confluence between the cataloging and metadata communities, so

Relevant Quotations: “I see an increasing confluence between the cataloging and metadata communities, so much so, that the two communities are becoming harder to distinguish, which is exactly as it should be. ” “There is a growing recognition in the metadata community of the relevance of the work that we in the library cataloging community have been doing. ” “The Dublin Core ‘qualifiers’ … are … an attempt being made now to enrich the Dublin Core Element Set by referencing a variety of content standards: subject thesauri; authority control systems; and classification systems.

Metadata and Representation: Taylor, A. 2004. Metadata. In the Organization of Information. 2 nd

Metadata and Representation: Taylor, A. 2004. Metadata. In the Organization of Information. 2 nd ed. Westport, CT. : Libraries Unlimited inc. [Ch. 6: 139 -158].

Definitions “Data about data” n Simple definition, causes confusion “Structured information that describes the

Definitions “Data about data” n Simple definition, causes confusion “Structured information that describes the attributes of information packages for the purposes of identification, discovery, and…management. ”

Why Metadata? “Hot topic” in LIS in recent years n Stems from proliferation of

Why Metadata? “Hot topic” in LIS in recent years n Stems from proliferation of electronic resources Major Significance: n n “Concern that some kind of standardized representation is needed for Internet resources” In order to locate the most “useful and reliable” information available

Types and Levels of Metadata 3 levels of complexity: n n n simple format

Types and Levels of Metadata 3 levels of complexity: n n n simple format (found within resource itself) structured format (ex: Dublin Core) rich format (MARC, AACR 2) 3 broad types: n n n administrative structural descriptive

Metadata Schemas Def: sets of metadata standards designed to meet the needs of particular

Metadata Schemas Def: sets of metadata standards designed to meet the needs of particular communities 3 Characteristics: n n n Structure - refers to data model and how metadata statements are expressed Syntax - encoding scheme Semantics - refers to meaning of data elements Also use content standards and controlled vocabulary

Special Characteristics of an Electronic Environment Interpolarity - ability of different systems to interact

Special Characteristics of an Electronic Environment Interpolarity - ability of different systems to interact with each other Flexibility - ability to enter as much or as little information as needed Extensibility - ability to use additional elements or qualifiers (more specified elements) to meet specific needs.

Objectives for Implementing an Information System Find Identify Select Obtain

Objectives for Implementing an Information System Find Identify Select Obtain

Schatz, Bruce R. Information Retrieval in Digital Libraries: Bringing Search to the Net Reviewed

Schatz, Bruce R. Information Retrieval in Digital Libraries: Bringing Search to the Net Reviewed by: David Salley

“Organized collections of scientific materials are traditionally called ‘libraries’ and the searchable online versions

“Organized collections of scientific materials are traditionally called ‘libraries’ and the searchable online versions of these are called ‘digital libraries’. The primary purpose of digital libraries is to enable searching of electronic collections distributed across networks, rather than merely creating electronic repositories from digitized materials. ”

“The fundamental technology for searching large collections is finally changing, so that information retrieval

“The fundamental technology for searching large collections is finally changing, so that information retrieval in the next century [originally published in 1997] will be far more semantic than syntactic, searching concepts rather than words. ” “The ’document’ has changed from a citation with descriptive headers to the abstract to the complete multimedia contents, including text, figures, tables, equations, and data.

Technology Timeline 1960’s Citations only [title, author, journal, keywords] generating bibliographies 1970’s Inverted Indexes,

Technology Timeline 1960’s Citations only [title, author, journal, keywords] generating bibliographies 1970’s Inverted Indexes, full text storage, proximity mapping 1980’s distributed personal workstations 1990’s The Internet, full multi-media

Vocabulary Switching Example: A journal article that only mentions ‘Unix’ would be tagged as

Vocabulary Switching Example: A journal article that only mentions ‘Unix’ would be tagged as being about ‘operating systems’ Currently being done by human indexers Hewlett-Packard, et al. currently working on computer systems to do this working with extensive ‘authority files’