Digital Object Architecture and the Handle System Larry
Digital Object Architecture and the Handle System Larry Lannom 20 June 2006 Corporation for National Research Initiatives http: //www. cnri. reston. va. us/ http: //www. handle. net/ Corporation for National Research Initiatives
What is the Problem? • Managing information in the Net over very long periods of time – e. g. centuries or more • Dealing with very large amounts of information in the Net over time • When information, its location(s) and even the underlying systems may change dramatically over time • Respecting and protecting rights, interests and value Corporation for National Research Initiatives
A Meta-level Architecture • Allows for arbitrary types of information systems • Allows for dynamic formatting and data typing • Can accommodate interoperability between multiple different information systems • Allows metadata schema to be identified and typed Corporation for National Research Initiatives
Digital Object Architecture: Motivation • To reformulate the Internet architecture around the notion of uniquely identifiable data structures • Enabling existing and new types of information to be reliably managed and accessed in the Internet environment over long periods of time • Providing mechanisms to stimulate innovation, the creation of dynamic new forms of expression and to manifest older forms • While supporting intellectual property protection, fine-grained access control, and enable well-formed business practices to emerge Corporation for National Research Initiatives
Digital Object Architecture: Components – Digital Objects (DOs) • Structured data, independent of the platform on which it was created • Consisting of “elements” of the form <type, value> • One of which is its unique, persistent identifier – Resolution of Unique Identifiers • Maps an identifier into “state information” about the DO • Handle System is a general purpose resolution system – Repositories from which DOs may be accessed • And into which they may be deposited – Metadata Registries • Repositories that contain general information about DOs • Supports multiple metadata schemes • Can map queries into unique DO specifications (via handles) Corporation for National Research Initiatives
What is a Digital Object • Defined data structure, machine independent • Consisting of a set of elements – Each of the form <type, value> – One of which is the unique identifier • Identifiers are known as “Handles” – Format is “prefix/suffix” – Prefix is unique to a naming authority – Suffix can be any string of bits assigned by that authority • Data structure can be parsed; types can be resolved within the architecture • Associated properties record and transaction record containing metadata and usage information Corporation for National Research Initiatives
Repository Notion Logical External Interface RAP Repository Access Protocol Any Hardware & Software Configuration Corporation for National Research Initiatives
Repositories & Digital Objects Each Digital Object has its own unique & persistent ID RAP Content Providers assign Ids No theoretical limits on number of DOs Per Repository REPOSITORY Corporation for National Research Initiatives Objects may be Replicated in Multiple Repositories
Handle System • Provides basic identifier resolution system for Internet • Logically centralized, but physically distributed and highly scalable • Enables association of one or more typed values, e. g. , IP address, public key, URL, with each id • Optimized for speed and reliability • Secure resolution with its own PKI as an option • Open, well-defined protocol and data model • Provides infrastructure for application domains, e. g. , digital libraries & publishing, network mgmt, id mgmt. . . Corporation for National Research Initiatives
Handles Resolve to Typed Data Handle 10. 123/456 Data type Index Handle data URL 1 http: //acme. com/…. URL 2 http: //a-books. com/…. DLS 9 acme/repository HS_ADMIN XYZ 100 12 Corporation for National Research Initiatives acme. admin/jsmith 10011110
Handle Resolution GHR LHS Client The Handle System is a collection of handle services, each of which consists of one or more replicated sites, each of which may have one or more servers. LHS Site 1 Site 2 Site 1 Site 3 …. . . Site n #1 #1 #2 #3 Site 2 #2 #4. . . #n 123. 456/abc URL 4 http: //www. acme. com/ URL 8 http: //www. ideal. com/ Corporation for National Research Initiatives
Handle Clients Request to Client: Resolve hdl: 10. 1000/1 1. Sends request to Global to resolve 0. NA/10. 1000 (naming authority handle for 10. 1000) Client Corporation for National Research Initiatives Global Handle Registry
Handle Clients Request to Client: Resolve hdl: 10. 1000/1 2. Global Responds with Service Information for 10. 1000 Client xcccxv xc xc xc . . . xcccxv xccx xc xc xc xc xc . . . xcccxv xccx xc xc xc . . . Service Information Acme Local Handle Service Corporation for National Research Initiatives Global Handle Registry
xcccxv xc xc xc . . . xcccxv xccx xc xc xc xc xc . . . xcccxv xccx xc xc xc . . . Handle Clients IP Address Port # Public Key . . . Primary Site Server 1 123. 45. 67. 8 2641 K 03 RLQ. . . Server 2 123. 52. 67. 9 2641 5&M#FG. . Server 1 321. 54. 678. 12 2641 F^*JLS. . . Server 2 321. 54. 678. 14 2641 3 E$T%. . . Server 3 762. 34. 1. 1 2641 A 2 S 4 D. . . 123. 45. 67. 4 2641 N 0 L 8 H 7. . . Secondary Site A Secondary Site B Server 1 Service Information - Acme Local Handle Service Corporation for National Research Initiatives
xcccxv xc xc xc . . . xcccxv xccx xc xc xc xc xc . . . xcccxv xccx xc xc xc . . . Handle Clients IP Address Port # Public Key . . . Primary Site Server 1 123. 45. 67. 8 2641 K 03 RLQ. . . Server 2 123. 52. 67. 9 2641 5&M#FG. . Server 1 321. 54. 678. 12 2641 F^*JLS. . . Server 2 321. 54. 678. 14 2641 3 E$T%. . . Server 3 762. 34. 1. 1 2641 A 2 S 4 D. . . 123. 45. 67. 4 2641 N 0 L 8 H 7. . . Secondary Site A Secondary Site B Server 1 Service Information - Acme Local Handle Service Corporation for National Research Initiatives
xcccxv xc xc xc . . . xcccxv xccx xc xc xc xc xc . . . xcccxv xccx xc xc xc . . . Handle Clients IP Address Port # Public Key . . . Primary Site Server 1 123. 45. 67. 8 2641 K 03 RLQ. . . Server 2 123. 52. 67. 9 2641 5&M#FG. . Server 1 321. 54. 678. 12 2641 F^*JLS. . . Server 2 321. 54. 678. 14 2641 3 E$T%. . . 762. 34. 1. 1 2641 A 2 S 4 D. . . 123. 45. 67. 4 2641 N 0 L 8 H 7. . . Secondary Site A Server 3 Secondary Site B Server 1 Service Information - Acme Local Handle Service Corporation for National Research Initiatives
Handle Clients Request to Client: Resolve hdl: 10. 1000/1 Client Global Handle Registry 3. Client queries Server 3 in Secondary Site A for 10. 1000/1 Acme Local Handle Service #1 Secondary Site B #2 Primary Site #1 #1 #2 #3 Secondary Site A Corporation for National Research Initiatives
Handle Clients Request to Client: Resolve hdl: 10. 1000/1 Global Handle Registry Client 4. Server responds with handle data Acme Local Handle Service #1 Secondary Site B #2 Primary Site #1 #1 #2 #3 Secondary Site A Corporation for National Research Initiatives
Handle Clients Handle Administration Client Web Client HTTP Get HTTP Redirect http: //hdl. handle. net/123. 456/abc Proxy/ Web Server Resolve Handle Data GHR LHS LHS LHS Corporation for National. System Research Initiatives Handle
Handle Clients Client Plug-In Handle Administration Client hdl: /123. 456/abc Handle Data Resolve Handle Request GHR LHS LHS LHS Corporation for National. System Research Initiatives Handle
Handle Clients Handle Administration Client Web HTTP Web Server Admin Forms Handle Admin API GHR LHS LHS LHS Corporation for National. System Research Initiatives Handle
Handle Clients Custom Client Handle Administration Client Web GHR LHS LHS LHS Corporation for National. System Research Initiatives Handle
Handle Clients Handle Administration embedded in another process Web GHR LHS LHS LHS Corporation for National. System Research Initiatives Handle
Handle Clients Handle Resolution embedded in another process Handle Administration embedded in another process GHR LHS LHS LHS Corporation for National. System Research Initiatives Handle
Handle System Usage • • • Library of Congress DTIC (Defense Technical Information Center) IDF (International DOI Foundation) – – – – • • • Cross. Ref (scholarly journal consortium) CAL (Copyright Agency Ltd - Australia) MEDRA (Multilingual European DOI Registration Agency) Nielsen Book. Data (bibliographic data - ISBN) R. R. Bowker (bibliographic data - ISBN) Office of Publications of the European Community German National Library of Science and Technology NTIS (National Technical Information Service) DSpace (MIT + HP) ADL (Do. D Advanced Distributed Learning initiative) Assorted Digital Library Projects In development: Globus Alliance Corporation for National Research Initiatives
Handle System Usage • Assigned Prefixes (June 06) – DOI - 1772 – Other - 801 • Handles – DOI - 22+ M – Other - Additional millions (total per prefix known only to prefix manager; LANL adding 600 M but privately) • Global – Core: three service sites (added locations being considered) – 53 M resolutions Corporation for National Research Initiatives
Handle System Management and Standards • Specification – RFC 3650: Overview – RFC 3651: Namespace and Service Definition – RFC 3652: Protocol • Do. DI 1322 – Will mandate Handle System use as part of ADL-R • ISO standards track for DOI • HSAC - Handle System Advisory Committee – Approx 15 members representing big users – Goal: evolve to oversee the system Corporation for National Research Initiatives
ADL Registry (ADL-R) • Technological and Organizational Infrastructure – Register the existence and access conditions for Learning Objects relevant to the Do. D ‘Enterprise’ – Provide user interface to search the registry • Integrates existing technologies – – – Handle System for identification and access XML for object description and submission LOM metadata Repository for metadata object storage and access Lucene search engine • Running at CNRI in initial production phase Corporation for National Research Initiatives
ADL-R Input Collections Input Processing Registry ADL-R A 3 A 2 A 1 Search Engine Content Objects ATSC Metadata Objects N 4 N 3 Content Objects N 2 N 1 hdl: 123/4 1 N 1 metadata Parse Authenticate Validate Return NAVAIR M 10 M 20 M 1 Content Objects Marines Corporation for National Research Initiatives
ADL-R Input Collections Input Processing Registry ADL-R A 3 A 2 A 1 Search Engine Content Objects ATSC Metadata Objects N 4 N 3 Content Objects N 2 N 1 hdl: 123/4 <xml> Parse <title>Course 1</title> Authenticate <org>J-School</org> Validate <hdl>123/4</hdl> Return. . . . </xml> N 1 metadata NAVAIR M 10 M 20 NAVAIR has Handle Prefix 123 and names N 1 hdl: 123/4 Content Objects Marines GHR DTIC M 1 LOC IDF LHS ADL-R NSDL LHS UWisc LHS Handle System Corporation for National Research Initiatives
ADL-R Input Collections Input Processing Registry ADL-R A 3 A 2 A 1 Search Engine Content Objects ATSC Metadata Objects N 4 N 3 Content Objects N 2 N 1 hdl: 123/4 2 Results Log Parse Authenticate Validate Return NAVAIR M 10 M 20 M 1 Content Objects Marines Corporation for National Research Initiatives
ADL-R Input Collections Input Processing Registry ADL-R A 3 A 2 A 1 Search Engine Content Objects ATSC Metadata Objects N 4 N 3 Content Objects Parse Authenticate Validate Return N 2 N 1 hdl: 123/4 3 NAVAIR Input process creates Metadata Object for N 1 named hdl: abc/d. . . hdl: abc/d M 10 M 20 Content Objects Marines GHR DTIC M 1 LOC IDF LHS 4 ADL-R NSDL LHS UWisc LHS Handle System Corporation for National Research Initiatives . . . and creates two handles: hdl: abc/d for the Metadata Object & hdl: 123/4 for the Content Object. Metadata Object matching Content Object N 1 xml
Searching the Registry ADL-R A 3 1 Search Engine A 2 A 1 Client does a search. Results point to Metadata Object abc/d. Metadata Objects Content Objects ATSC hdl: abc/d N 4 N 3 Content Objects Client N 2 N 1 hdl: 123/4 Metadata Object matching Content Object N 1 xml NAVAIR M 10 M 20 M 1 GHR DTIC Content Objects LOC IDF ADL-R NSDL Marines LHS Corporation for National Research Initiatives Handle System LHS UWisc
Searching the Registry ADL-R A 3 Search Engine A 2 A 1 1 Client does a search. Results point to Metadata Object abc/d. 2 If desired, client gets Metadata Object abc/d to view full registry metadata. Metadata Objects Content Objects ATSC hdl: abc/d N 4 N 3 Content Objects N 2 N 1 hdl: 123/4 Metadata Object matching Content Object N 1 xml NAVAIR M 10 M 20 M 1 GHR DTIC Content Objects LOC IDF ADL-R NSDL Marines LHS Corporation for National Research Initiatives Handle System LHS UWisc Client
Searching the Registry ADL-R A 3 Search Engine A 2 A 1 1 Client does a search. Results point to Metadata Object abc/d. 2 If desired, client gets Metadata Object abc/d to view full registry metadata. 3 Client decides to get Content Object N 1 and resolves handle 123/4 to get its access location and other conditions. Metadata Objects Content Objects ATSC hdl: abc/d N 4 N 3 Content Objects N 2 N 1 hdl: 123/4 Metadata Object matching Content Object N 1 xml NAVAIR M 10 M 20 M 1 GHR DTIC Content Objects LOC IDF ADL-R NSDL Marines LHS Corporation for National Research Initiatives Handle System LHS UWisc Client
Searching the Registry ADL-R A 3 Search Engine A 2 A 1 1 Client does a search. Results point to Metadata Object abc/d. 2 If desired, client gets Metadata Object abc/d to view full registry metadata. 3 Client decides to get Content Object N 1 and resolves handle 123/4 to get its access location and other conditions. Metadata Objects Content Objects ATSC hdl: abc/d N 4 N 3 Content Objects N 2 N 1 hdl: 123/4 Metadata Object matching Content Object N 1 xml NAVAIR 4 Client requests a copy of Content Object N 1 from NAVAIR. M 10 M 20 M 1 GHR DTIC Content Objects LOC IDF ADL-R NSDL Marines LHS Corporation for National Research Initiatives Handle System LHS UWisc Client
ADL-R CORDRA Registry Content Repository Object Level Metadata Content Repository CORDRA Community Corporation for National Research Initiatives
CORDRA Community CORDRA Registry Content Repositories Master Registry of Registries Federation Level Metadata Content Repositories Federation Level Metadata Intermediate Registry of Registries Federation Level Metadata CORDRA Registry Intermediate Registry of Registries CORDRA Community Content Repositories CORDRA Registry Community CORDRA Registry Content Repositories CORDRA Community CORDRA Registry Federation Level Metadata CORDRA Community CORDRA Registry Content Repositories Corporation for National Research Initiatives Content Repositories
- Slides: 38