GDFR Pilot Discussion The National Archives Washington DC

  • Slides: 45
Download presentation
GDFR Pilot Discussion The National Archives Washington DC July 10, 2008

GDFR Pilot Discussion The National Archives Washington DC July 10, 2008

Agenda 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.

Agenda 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. Introductions – (All) Purpose of meeting – (Dale) Roles – (Dale, Richard) Background/history – (Stephen) GDFR Governance Workshop – (Richard, Robert) Architecture – (Stephen) Current state – (Andrea) Relationship to PRONOM – (Andrea) Issues and observations – (Dale) Use cases – (Andrea) Discussion of pilot – (All) Review next steps from GDFR Governance Workshop Report – (Richard, Robert) Outreach to other interested parties – (All) Next steps – (All)

Introductions All

Introductions All

Purpose of the meeting Dale Flecker

Purpose of the meeting Dale Flecker

Roles Harvard – Dale Flecker NARA –Richard Steinbacher

Roles Harvard – Dale Flecker NARA –Richard Steinbacher

Background/History Stephen Abrams

Background/History Stephen Abrams

Background/History ¢ Format is the key piece of representation information that permits preservation activities

Background/History ¢ Format is the key piece of representation information that permits preservation activities to be focused on interpretable/renderable content, not just opaque bit strings ffd 8 ffe 000104 a 46494600010201 00830000 ffed 0 fb 050686 f 74 6 f 73686 f 7020332 e 30003842494 d 03 e 90 a 5072696 e 7420496 e 666 f 00 000000780000048000002 f 40240 ffee 03060252 0347052803 fc 000200000048 000002 d 80228000100000064 000000010003030300000001270 f 000100000000006008001901900000. . . SOI APP 0 APP 13 APP 2 DQT SOF 0 DRI DHT SOS ECS 0. . . JFIF 1. 2 IPTC ICC 183 x 512

Background/History ¢ Traditional methods of managing format information, e. g. the IANA MIME registry,

Background/History ¢ Traditional methods of managing format information, e. g. the IANA MIME registry, are insufficiently descriptive and granular for effective preservation planning and intervention l The application/word format is essentially defined as anything produced by the Word application l TIFF 6. 0, TIFF/IT, TIFF/EP, Geo. TIFF, … image/tiff

Background/History ¢ Two DLF-sponsored invitational workshops l l ¢ Univ. Pennsylvania, January 2003 Washington,

Background/History ¢ Two DLF-sponsored invitational workshops l l ¢ Univ. Pennsylvania, January 2003 Washington, March 2003 Two independent demonstration projects l l FRED, John Ockerbloom, Univ. Pennsylvania FOCUS, Joseph Ja. Ja, Univ. Maryland

Background/History ¢ Evolving consensus on scope l A forum for documenting normative definitions of

Background/History ¢ Evolving consensus on scope l A forum for documenting normative definitions of format syntax and semantics l A common facility to pool and share scarce technical expertise on a global basis l A channel for the distribution of that expertise to the international community of preservation practitioners l A foundation for additional value-added services requiring detailed knowledge of digital formats

Background/History ¢ Peer-to-peer network of independent, but cooperating registries

Background/History ¢ Peer-to-peer network of independent, but cooperating registries

Background/History ¢ Harvard University Library (HUL) funded for 2 years by the Andrew W.

Background/History ¢ Harvard University Library (HUL) funded for 2 years by the Andrew W. Mellon Foundation l ¢ Technical deliverables only; no funded governance/policy activity Staffing and technical work subcontracted to OCLC (July 2006)

NARA Governance Workshop Richard Steinbacher Robert Chadduck

NARA Governance Workshop Richard Steinbacher Robert Chadduck

Architecture Stephen Abrams

Architecture Stephen Abrams

Architecture ¢ A generic distributed registry framework, specialized for the GDFR application ¢ Based

Architecture ¢ A generic distributed registry framework, specialized for the GDFR application ¢ Based on well-known products and protocols ¢ Human and machine interfaces ¢ Full information content expressible in XML form; can be re-instantiated from that expression ¢ Platform independence ¢ Globally fault tolerant ¢ Open source

Architecture ¢ Data model is an extension of PRONOM 4

Architecture ¢ Data model is an extension of PRONOM 4

Architecture ¢ Based on the OCLC IWSA/RFA framework

Architecture ¢ Based on the OCLC IWSA/RFA framework

Architecture ¢ Java, Apache/Tomcat, Berkeley DB XML ¢ GNU LGPL license l Including technology

Architecture ¢ Java, Apache/Tomcat, Berkeley DB XML ¢ GNU LGPL license l Including technology newly-developed for the project and pre-existing OCLC technology

Current state Andrea Goethals

Current state Andrea Goethals

Current state: schedule ¢ July 31, 2008 Contract with OCLC ends l GDFR source

Current state: schedule ¢ July 31, 2008 Contract with OCLC ends l GDFR source node at Harvard goes public in beta mode l ¢ August 2008 up to August 2010 l Harvard maintains GDFR software, website and source node

Current state: GDFR Home website ¢ ¢ ¢ It moved! Old GDFR Home: http:

Current state: GDFR Home website ¢ ¢ ¢ It moved! Old GDFR Home: http: //www. formatregistry. org New GDFR Home: http: //www. gdfr. info l l All existing GDFR docs migrated from the old GDFR Home website Over the next month • Updated documentation! • Demo source node?

Current state: architecture l Currently: • One GDFR source node • Where all data

Current state: architecture l Currently: • One GDFR source node • Where all data additions and edits are performed • Many GDFR mirror nodes • Replicated data l Future? • Multiple GDFR source nodes? • Multiple interoperable format registry source nodes? l l “Discoverable” from GDFR Home website Each node has 2 Interfaces • For humans: user interface • For machines: web service interface

Current state: GDFR source node ¢ Housed by Harvard for now l http: //www.

Current state: GDFR source node ¢ Housed by Harvard for now l http: //www. formatregistry. org/registry Populated with test data- ~2000 formats from Magic database ¢ Need authorized account to add/edit data ¢

Current state: GDFR mirror nodes Test mirror nodes at OCLC and Harvard ¢ Anyone

Current state: GDFR mirror nodes Test mirror nodes at OCLC and Harvard ¢ Anyone can run a mirror node ¢ Synchronize data with the source node ¢ Can brand your mirror node ¢

Current state: Mirror node set-up ¢ Dependencies Apache 2 (mod_rewrite, mod_jk, mod_perl 2) l

Current state: Mirror node set-up ¢ Dependencies Apache 2 (mod_rewrite, mod_jk, mod_perl 2) l Tomcat 5. 5. x l Berkeley DBXML 2. 3. 10 l Perl 5. 8. x l Java 1. 5 l ¢ Installation & configuration – half day

User interface ¢ Mirror node l ¢ Search, browse, lookup/retrieve, export, manage node Source

User interface ¢ Mirror node l ¢ Search, browse, lookup/retrieve, export, manage node Source node Same as mirror node l Plus: add, edit l ¢ Sneak preview

Current state: machine interface Web services using SRU ¢ Can do everything supported by

Current state: machine interface Web services using SRU ¢ Can do everything supported by the human user interface ¢ l ¢ Except browsing Plus mirror-to-source node synchronization

Relationship to PRONOM Andrea Goethals

Relationship to PRONOM Andrea Goethals

Relationship to PRONOM – what’s the problem? ¢ ¢ ¢ Two different “format” registries

Relationship to PRONOM – what’s the problem? ¢ ¢ ¢ Two different “format” registries l Overlapping but digressing data model l No common format model l No mechanism to exchange data PRONOM is in production, GDFR is not yet l PRONOM has been publicly available for over 4 years and is used by some preservation repositories l Interoperates with DROID l Basis for PLANET projects How many format registries does the digital preservation community need? l Depends on how different they are…

Relationship to PRONOM – core differences ¢ ¢ ¢ Who governs the registry and

Relationship to PRONOM – core differences ¢ ¢ ¢ Who governs the registry and makes policy, scope and enhancement decisions? l PRONOM: TNA l GDFR: community-based Who adds and edits format information? l PRONOM: TNA (does take addition requests) l GDFR: community-based Where is the format information physically located? l PRONOM: at TNA l GDFR: replicated in different geographic locations

Relationship to PRONOM – what’s the solution? ¢ ¢ ¢ Recognize there is a

Relationship to PRONOM – what’s the solution? ¢ ¢ ¢ Recognize there is a problem – DONE l Mutual willingness to resolve l TNA desire to participate in a GDFR pilot Common web service API across the registries? l PRONOM could become a GDFR node l PRONOM and GDFR could each support a new web service API Cross-walk PRONOM PUIDs and GDFR GFIDs? l Use common format identification tools (DROID, JHOVE, etc. ) with either registry

Issues and Observations Dale Flecker

Issues and Observations Dale Flecker

Use cases Andrea Goethals

Use cases Andrea Goethals

Use cases – 3 sets (see handout) Higher-level use cases submitted by many institutions

Use cases – 3 sets (see handout) Higher-level use cases submitted by many institutions (early 2003) ¢ Lower-level use case model created for the software design (2006 -7) ¢ Use cases arising from informal talks and meetings ¢

Key use cases – discussed but not supported ¢ ¢ ¢ ¢ ¢ Determine

Key use cases – discussed but not supported ¢ ¢ ¢ ¢ ¢ Determine duplicates Notifications/warnings Determine migration/emulation pathways Determine at-risk formats (machine-actionable risk assessments) Support the registry & discovery of GDFR nodes Authentication of nodes and users (outside the UI) Storage of local profiles separate from central formats Synchronizations based on vetted or non-vetted data Determine “quality” of format information Multiple source nodes

Use cases- common issues ¢ How evaluative should GDFR be? l ¢ Neutral vs

Use cases- common issues ¢ How evaluative should GDFR be? l ¢ Neutral vs judgmental Are services in the scope of GDFR? l Should GDFR provide services directly (notifications, validation, etc. ) or should GDFR be a reference that can be used by external services?

Discussion of pilot All

Discussion of pilot All

Discussion of pilot ¢ Purposes

Discussion of pilot ¢ Purposes

Discussion of pilot ¢ Pilot use cases

Discussion of pilot ¢ Pilot use cases

Discussion of pilot ¢ Process

Discussion of pilot ¢ Process

Discussion of pilot ¢ Participants

Discussion of pilot ¢ Participants

Review next steps from the GDFR Governance Workshop Report Richard Steinbacher Robert Chadduck

Review next steps from the GDFR Governance Workshop Report Richard Steinbacher Robert Chadduck

Outreach to other interested parties All

Outreach to other interested parties All

Next steps? All

Next steps? All