Persistent identifiers an Overview Juha Hakala The National

  • Slides: 12
Download presentation
Persistent identifiers – an Overview Juha Hakala The National Library of Finland 2011 -02

Persistent identifiers – an Overview Juha Hakala The National Library of Finland 2011 -02 -01

Traditional identifiers • Traditional (bibliographic) identifiers are systems like ISBN (International Standard Book Number)

Traditional identifiers • Traditional (bibliographic) identifiers are systems like ISBN (International Standard Book Number) which provide unique and persistent identification for certain types of resources (books, serials, etc. ) • They were designed for printed resources before the Internet was invented; thus the match with the digital resources and the Web may be a forced one • These identifiers are well established international standards with relatively clear roles • Not always clear how to apply them to the e-resources, except that identified resources themselves should be persistent

Persistent identifiers (PIDs) • A new category of identifiers which are actionable in the

Persistent identifiers (PIDs) • A new category of identifiers which are actionable in the Internet, that is, they enable persistent linking (resolution) to the resource or a surrogate such as a bibliographic description of the resource • Most PIDs are also “traditional” identifiers • When using a DOI, one can identify a book with DOI & an embedded ISBN or DOI with a local ID string • URN is the only exception from this; URNs must include a traditional identifier • URN namespaces inherit the rules of the traditional identifier used; there is no need to discuss the scope of the URN itself

Traditional versus persistent identifiers • Assigning a traditional identifier such as ISBN is (should

Traditional versus persistent identifiers • Assigning a traditional identifier such as ISBN is (should be? ) a controlled process with precise rules • What is identified, by whom • Assigning a PID such as ARK may or may not be a controlled process and the rules of application may be vague • Sometimes the rules are different: • A book must have just one ISBN, but it may have two PIDs (for instance, ARK and DOI) • The National Library of Finland uses Handles in its Dspace system, but URN is the ”official” identifier of these resources

Recommendations • Conflicts between the two identifier groups should be avoided at all cost

Recommendations • Conflicts between the two identifier groups should be avoided at all cost • If a traditional identifier can be assigned to the resource, use that identifier as a part of the PID • It follows that PIDs that cannot (easily) incorporate traditional identifiers may cause problems • Any identifier (traditional / PID) should have explicit implementation guidelines • If no general guidelines exist rules must be developed locally; such rules should eventually be aligned in the level of the PID community

Persistent identifiers and the Web: Cool URIs • From the library point of view,

Persistent identifiers and the Web: Cool URIs • From the library point of view, cool URIs (URLs) are not proper identifiers at all • The same resource may be available from many URLs • Over time, different resources or variant versions of the same resource may be available in the same URI • There is absolutely no control over cool URI assignment • A user cannot know if a URI is cool or not (most of them aren’t) • Instead, cool URIs are just shelf marks • What is a realistic time frame for cool URI persistence? • Cool URIs can support only resolution; persistent identifiers can be more versatile in this respect • Match with the current / future long term preservation systems

Services provided by PIDs • Basic question: what services do we need? • Some

Services provided by PIDs • Basic question: what services do we need? • Some examples: • Find all locations (URLs) related to the PID • Find bibliographic metadata related to the PID • Retrieve the preservation commitment of the owning organization (concerning the resource at hand) • There is no overall framework / context within which to design the resolution services • Each PID provides a slightly different set

PID –based services in the future • Theoretical basis could be twofold: • Functional

PID –based services in the future • Theoretical basis could be twofold: • Functional requirements for bibliographic records (FRBR) – model: work, expression, manifestation • Current theory and practice of long-term preservation based on the migration strategy (and a long tail of manifestations for each work) • This means it must be possible for instance to: • • Find all works related to the work at hand Find all expressions related to the work at hand Find all manifestations of the work at hand Find out differences between these manifestations

PID–based services in the future (2) • It should also be possible to •

PID–based services in the future (2) • It should also be possible to • Find out who is preserving the resource • Retrieve the rights metadata related to the resource • Retrieve the preservation metadata related to the resource • Retrieve the most original version (the eldest preserved manifestation) of the resource • Retrieve the latest (and supposedly the easiest to use) manifestation of the resource • …

Example: qualitative social scientific data set • The work itself should be described; one

Example: qualitative social scientific data set • The work itself should be described; one metadata element should be the PID • Expressions (translations to other languages) should have their own PIDs, linked to the work level record • There may be multiple manifestations (relational database, Excel table, etc. ) of each expression; each one should have its own PID, and there should be links to the work / expressions • In this environment, it would make sense to provide links to the work, and let the users to choose the most appropriate manifestation • Choice of the language, file format, etc.

Recommendations (2) • Services supported by PID systems need a face lift • Many

Recommendations (2) • Services supported by PID systems need a face lift • Many systems were designed 10+ years ago, when digital object management systems were still in their infancy • Upgrades must be done in a non-destructive manner (existing implementations must be compliant with the new version) • All aspects of PID systems should be standardized • Some PIDs (e. g. ARK and PURL) have never reached a standard status, and at best only one part of the system (identifier syntax) has been published as a standard • More (and better) open source implementations are needed

Conclusion • There will be multiple PIDs in existence in the future (just like

Conclusion • There will be multiple PIDs in existence in the future (just like there are now) • Once a system has been chosen, you cannot give it up • PID supporters and cool URI proponents will most likely continue talking past one another for quite some time, but: • Given the time frame the national libraries & archives must preserve resources (centuries) and the technical complexity of this task, cool URIs fall short of the requirements in several ways; instead, PIDs must be used • PID systems are to some extent ”work in progress”