GDFR Pilot Discussion The National Archives Washington DC
- Slides: 45
GDFR Pilot Discussion The National Archives Washington DC July 10, 2008
Agenda 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. Introductions – (All) Purpose of meeting – (Dale) Roles – (Dale, Richard) Background/history – (Stephen) GDFR Governance Workshop – (Richard, Robert) Architecture – (Stephen) Current state – (Andrea) Relationship to PRONOM – (Andrea) Issues and observations – (Dale) Use cases – (Andrea) Discussion of pilot – (All) Review next steps from GDFR Governance Workshop Report – (Richard, Robert) Outreach to other interested parties – (All) Next steps – (All)
Introductions All
Purpose of the meeting Dale Flecker
Roles Harvard – Dale Flecker NARA –Richard Steinbacher
Background/History Stephen Abrams
Background/History ¢ Format is the key piece of representation information that permits preservation activities to be focused on interpretable/renderable content, not just opaque bit strings ffd 8 ffe 000104 a 46494600010201 00830000 ffed 0 fb 050686 f 74 6 f 73686 f 7020332 e 30003842494 d 03 e 90 a 5072696 e 7420496 e 666 f 00 000000780000048000002 f 40240 ffee 03060252 0347052803 fc 000200000048 000002 d 80228000100000064 000000010003030300000001270 f 000100000000006008001901900000. . . SOI APP 0 APP 13 APP 2 DQT SOF 0 DRI DHT SOS ECS 0. . . JFIF 1. 2 IPTC ICC 183 x 512
Background/History ¢ Traditional methods of managing format information, e. g. the IANA MIME registry, are insufficiently descriptive and granular for effective preservation planning and intervention l The application/word format is essentially defined as anything produced by the Word application l TIFF 6. 0, TIFF/IT, TIFF/EP, Geo. TIFF, … image/tiff
Background/History ¢ Two DLF-sponsored invitational workshops l l ¢ Univ. Pennsylvania, January 2003 Washington, March 2003 Two independent demonstration projects l l FRED, John Ockerbloom, Univ. Pennsylvania FOCUS, Joseph Ja. Ja, Univ. Maryland
Background/History ¢ Evolving consensus on scope l A forum for documenting normative definitions of format syntax and semantics l A common facility to pool and share scarce technical expertise on a global basis l A channel for the distribution of that expertise to the international community of preservation practitioners l A foundation for additional value-added services requiring detailed knowledge of digital formats
Background/History ¢ Peer-to-peer network of independent, but cooperating registries
Background/History ¢ Harvard University Library (HUL) funded for 2 years by the Andrew W. Mellon Foundation l ¢ Technical deliverables only; no funded governance/policy activity Staffing and technical work subcontracted to OCLC (July 2006)
NARA Governance Workshop Richard Steinbacher Robert Chadduck
Architecture Stephen Abrams
Architecture ¢ A generic distributed registry framework, specialized for the GDFR application ¢ Based on well-known products and protocols ¢ Human and machine interfaces ¢ Full information content expressible in XML form; can be re-instantiated from that expression ¢ Platform independence ¢ Globally fault tolerant ¢ Open source
Architecture ¢ Data model is an extension of PRONOM 4
Architecture ¢ Based on the OCLC IWSA/RFA framework
Architecture ¢ Java, Apache/Tomcat, Berkeley DB XML ¢ GNU LGPL license l Including technology newly-developed for the project and pre-existing OCLC technology
Current state Andrea Goethals
Current state: schedule ¢ July 31, 2008 Contract with OCLC ends l GDFR source node at Harvard goes public in beta mode l ¢ August 2008 up to August 2010 l Harvard maintains GDFR software, website and source node
Current state: GDFR Home website ¢ ¢ ¢ It moved! Old GDFR Home: http: //www. formatregistry. org New GDFR Home: http: //www. gdfr. info l l All existing GDFR docs migrated from the old GDFR Home website Over the next month • Updated documentation! • Demo source node?
Current state: architecture l Currently: • One GDFR source node • Where all data additions and edits are performed • Many GDFR mirror nodes • Replicated data l Future? • Multiple GDFR source nodes? • Multiple interoperable format registry source nodes? l l “Discoverable” from GDFR Home website Each node has 2 Interfaces • For humans: user interface • For machines: web service interface
Current state: GDFR source node ¢ Housed by Harvard for now l http: //www. formatregistry. org/registry Populated with test data- ~2000 formats from Magic database ¢ Need authorized account to add/edit data ¢
Current state: GDFR mirror nodes Test mirror nodes at OCLC and Harvard ¢ Anyone can run a mirror node ¢ Synchronize data with the source node ¢ Can brand your mirror node ¢
Current state: Mirror node set-up ¢ Dependencies Apache 2 (mod_rewrite, mod_jk, mod_perl 2) l Tomcat 5. 5. x l Berkeley DBXML 2. 3. 10 l Perl 5. 8. x l Java 1. 5 l ¢ Installation & configuration – half day
User interface ¢ Mirror node l ¢ Search, browse, lookup/retrieve, export, manage node Source node Same as mirror node l Plus: add, edit l ¢ Sneak preview
Current state: machine interface Web services using SRU ¢ Can do everything supported by the human user interface ¢ l ¢ Except browsing Plus mirror-to-source node synchronization
Relationship to PRONOM Andrea Goethals
Relationship to PRONOM – what’s the problem? ¢ ¢ ¢ Two different “format” registries l Overlapping but digressing data model l No common format model l No mechanism to exchange data PRONOM is in production, GDFR is not yet l PRONOM has been publicly available for over 4 years and is used by some preservation repositories l Interoperates with DROID l Basis for PLANET projects How many format registries does the digital preservation community need? l Depends on how different they are…
Relationship to PRONOM – core differences ¢ ¢ ¢ Who governs the registry and makes policy, scope and enhancement decisions? l PRONOM: TNA l GDFR: community-based Who adds and edits format information? l PRONOM: TNA (does take addition requests) l GDFR: community-based Where is the format information physically located? l PRONOM: at TNA l GDFR: replicated in different geographic locations
Relationship to PRONOM – what’s the solution? ¢ ¢ ¢ Recognize there is a problem – DONE l Mutual willingness to resolve l TNA desire to participate in a GDFR pilot Common web service API across the registries? l PRONOM could become a GDFR node l PRONOM and GDFR could each support a new web service API Cross-walk PRONOM PUIDs and GDFR GFIDs? l Use common format identification tools (DROID, JHOVE, etc. ) with either registry
Issues and Observations Dale Flecker
Use cases Andrea Goethals
Use cases – 3 sets (see handout) Higher-level use cases submitted by many institutions (early 2003) ¢ Lower-level use case model created for the software design (2006 -7) ¢ Use cases arising from informal talks and meetings ¢
Key use cases – discussed but not supported ¢ ¢ ¢ ¢ ¢ Determine duplicates Notifications/warnings Determine migration/emulation pathways Determine at-risk formats (machine-actionable risk assessments) Support the registry & discovery of GDFR nodes Authentication of nodes and users (outside the UI) Storage of local profiles separate from central formats Synchronizations based on vetted or non-vetted data Determine “quality” of format information Multiple source nodes
Use cases- common issues ¢ How evaluative should GDFR be? l ¢ Neutral vs judgmental Are services in the scope of GDFR? l Should GDFR provide services directly (notifications, validation, etc. ) or should GDFR be a reference that can be used by external services?
Discussion of pilot All
Discussion of pilot ¢ Purposes
Discussion of pilot ¢ Pilot use cases
Discussion of pilot ¢ Process
Discussion of pilot ¢ Participants
Review next steps from the GDFR Governance Workshop Report Richard Steinbacher Robert Chadduck
Outreach to other interested parties All
Next steps? All
- Washington digital archives
- Father of national archives of india
- National archives and records service
- National archives
- National archives gb rail 253/516
- Simple distillation
- Bt digital archives
- 1940 census.archives.gov
- Wisconcin digital archives
- Dna daily news and analysis
- Vanderbilt television news archive
- Ryerson elibrary
- Interim archives
- The world bank
- Archives and museums du study material
- Coloured gemstones working group archives
- Indot rfp
- Sheffield telegraph archives
- Www.archives.71fr
- Baltimore city archives
- Tom thoon
- Motherwell times archives
- Ucl archives and records management
- Jewish general hospital archives
- Religious archives examples
- Lời thề hippocrates
- Sự nuôi và dạy con của hổ
- đại từ thay thế
- Quá trình desamine hóa có thể tạo ra
- Vẽ hình chiếu vuông góc của vật thể sau
- Công thức tiính động năng
- Thế nào là mạng điện lắp đặt kiểu nổi
- Hát kết hợp bộ gõ cơ thể
- Dot
- Vẽ hình chiếu đứng bằng cạnh của vật thể
- độ dài liên kết
- Chó sói
- Các môn thể thao bắt đầu bằng tiếng đua
- Khi nào hổ mẹ dạy hổ con săn mồi
- điện thế nghỉ
- Một số thể thơ truyền thống
- Biện pháp chống mỏi cơ
- Trời xanh đây là của chúng ta thể thơ
- Các số nguyên tố
- Thiếu nhi thế giới liên hoan