Overview of Archival Resource Key ARK Tools 1

  • Slides: 14
Download presentation
Overview of Archival Resource Key (ARK) Tools 1 July 2005 John Kunze, California Digital

Overview of Archival Resource Key (ARK) Tools 1 July 2005 John Kunze, California Digital Library

ARK Summary Instead of one Name Authority: Assigning Authority + Mapping Authorities http: //foobar.

ARK Summary Instead of one Name Authority: Assigning Authority + Mapping Authorities http: //foobar. zaf. org/ark: /12025/654 xz 321/s 3/f 8. 05 v. tiff __________/ ___/ ________/ (replaceable) | | | 4 Qualifier | ARK Label | | (NMA-supported) | | | 1 Name Mapping Authority | 3 Name (NAA-assigned) Hostport (NMAH) | 2 Name Assigning Authority Number (NAAN) 1 = current service provider; identity inert; replaceable 2 = organization that originally assigned the id 3 = name originally assigned to the abstract object, often opaque 4 = extension disclosing object hierarchy & variants, often non-opaque

ARK usage Two ARKs accessing the same thing http: //loc. gov/ark: /12025/654 xz 321

ARK usage Two ARKs accessing the same thing http: //loc. gov/ark: /12025/654 xz 321 http: //rutgers. edu/ark: /12025/654 xz 321 Access to metadata -- add a ‘? ’ http: //loc. gov/ark: /12025/654 xz 321? Access to support statement -- add ‘? ? ’ http: //loc. gov/ark: /12025/654 xz 321? ? • 3 minimal requirements to be an ARK – An archive that can’t do all 3 -- trustworthy? – Is an ARK persistent? Maybe. Have to ask.

Persistence and opaqueness • Do ARKs have to be this ugly (opaque)? http: //foobar.

Persistence and opaqueness • Do ARKs have to be this ugly (opaque)? http: //foobar. zaf. org/ark: /12025/654 xz 321/s 3/f 8. 05 v. tiff __________/ ___/ ________/ NMAH Label NAAN Name Qualifier • No, but they encourage it. Persistence is all about managing associations between strings and things – And the landscape is littered with links that were required to die for political, legal, or social reasons – the appearance, deliberate or even accidental, of once-true assertions that are now misleading, infringing, offensive makes it hard for our descendants to continue managing • Pain of managing opaque ids is mitigated by the certainty of having strongly bound metadata

A hostname may also break • Did it break because it appears to assert

A hostname may also break • Did it break because it appears to assert a branding that is no longer relevant? Have to pay attention to this. • Semantic rot is inevitable in all ids – The more opaque, the more protected – Non-opaque ids are very useful ad hoc metadata containers; in the tradeoff, consider the more regular and complete metadata promised by ARKs – Non-opaque service label extensions to opaque base ARKs are suitable; eg, “thumb”, “hi-res”

When the hostname breaks • Use low-tech, file lookup (like old /etc/hosts) • Or

When the hostname breaks • Use low-tech, file lookup (like old /etc/hosts) • Or use MAPTR algorithm in client or plug-in – Resolver discovery using vanilla DNS and script: use Net: : DNS; # include simple DNS package my $qtype = "NAPTR"; # initialize query type my $naa = shift; # get NAAN script argument my $mad = new Net: : DNS: : Resolver; # mapping authority discovery &maptr("$naa. ark. arpa"); # call maptr - that's it sub maptr { # recursive maptr algorithm my $dname = shift; # domain name as argument my ($rr, $order, $pref, $flags, $service, $regexp, $replacement); my $query = $mad->query($dname, $qtype); return if (! $query || ! $query->answer); foreach $rr ($query->answer) { next if ($rr->type ne $qtype); ($order, $pref, $flags, $service, $regexp, $replacement) = split(/s/, $rr->rdatastr); if ($flags eq "") { &maptr($replacement); # recurse } elsif ($flags eq "h") { print "$replacementn"; # candidate NMAH }}}

ARK lexical goodies • Hyphens ignored – Neutralizes harm done by typesetters • Too

ARK lexical goodies • Hyphens ignored – Neutralizes harm done by typesetters • Too many search results? Providers may disclose (or not)… – Sub-object hierarchy using reserved ‘/’ – Variant objects using reserved ‘. ’ – Usual %hh (hex encoding) as an escape

ARK namespaces reserved 12025 12026 12027 13030 13038 20775 29114 28722 15230 13960 64269

ARK namespaces reserved 12025 12026 12027 13030 13038 20775 29114 28722 15230 13960 64269 62624 67531 27927 12148 National Library of Medicine Library of Congress National Agriculture Library California Digital Library World Intellectual Property Organization University of California San Diego University of California San Francisco University of California Berkeley Rutgers University Libraries Internet Archive Digital Curation Centre New York University Libraries University of North Texas Libraries Ithaka Electronic-Archiving Initiative National Library of France Reserve a namespace by email to ark@cdlib. org

The Their Stuff problem is easier • We can’t do much about Their Stuff

The Their Stuff problem is easier • We can’t do much about Their Stuff except defensively test and fix Our links to it • Not worth Our ARKs -- we can’t vouch for the objects • Indirect naming may help (eg, PURL, SFX, etc) • So get a link validator, staff to replace dead URLs, and figure out how much effort you’ll expend • Email Them (external providers), if appropriate, but if They don’t maintain their ids, no scheme will help

Our Stuff Solutions for persistent identifier problems • Identifier maintenance is different from but

Our Stuff Solutions for persistent identifier problems • Identifier maintenance is different from but deeply implicated in collection mgmt • Recall: an identifier is [a string and] an association between a string and a thing – If you maintain object metadata, you already maintain ids (assuming your object has an id) – Natural to maintain redirection info as one more column of metadata, and ask your DB admin to nightly recreate web server redirect config files

Opaque identifier tools • Non-opaque identifier strings are chosen deliberately to assert some things

Opaque identifier tools • Non-opaque identifier strings are chosen deliberately to assert some things that are true at the time of assignment • Opaque identifier strings are best chosen by automated means, such as – NOID (nice opaque identifier) – Or UUID/GUID (universally unique identifier) • Sequence of hex encodings of your computer’s MAC address, current time, and sometimes a random number • No need to ask permission or register yourself • Looks like a something found in nature, but actually it’s based on IEEE and hardware vendor registries

Nice opaque identifiers (NOID) • A noid minter is a lightweight database for generating,

Nice opaque identifiers (NOID) • A noid minter is a lightweight database for generating, tracking, and binding unique ids • The noid tool creates minters and accepts commands that operate them – Open source, available at www. cpan. org • Can mint in random or sequential order, with or without a check character guaranteeing against the most common transcription errors • Anyone can run a noid minter, maintain associations via bindings to arbitrary elements (assertions), and set up a resolver (including rule-based)

Using NOID • Identifiers minted according to a template: noid dbcreate f 5. reedeedk

Using NOID • Identifiers minted according to a template: noid dbcreate f 5. reedeedk long 13030 which produces as first minted id 13030/f 54 x 54 g 11 • Noid is scheme-independent – Can be used to mint DOIs, URNs, URLs, lotto numbers, etc. – We (at CDL) use it to mint random ARKs with check chars

ARK Documentation • ARK specification http: //www. ietf. org/internet-drafts/draft-kunze-ark-09. txt • ARK information sites

ARK Documentation • ARK specification http: //www. ietf. org/internet-drafts/draft-kunze-ark-09. txt • ARK information sites http: //www. cdlib. org/inside/diglib/ark/ http: //ark. nlm. nih. gov/ • Overview article http: //www. infotoday. com/cilmag/feb 04/primers. shtml • Background paper http: //bibnum. bnf. fr/ecdl/2003/proceedings. php? f=kunze