Unified Digital Format Registry a semantic registry for

  • Slides: 109
Download presentation
Unified Digital Format Registry a semantic registry for digital preservation International Internet Preservation Consortium

Unified Digital Format Registry a semantic registry for digital preservation International Internet Preservation Consortium (IIPC) General Assembly Library of Congress, April 30 – May 4, 2012 Unified Digital Format Registry (UDFR) Understanding the System and Service Stephen Abrams Lisa Dawn Colvin Abhishek Salve UC Curation Center California Digital Library http: //www. cdlib. org/uc 3

Unified Digital Format Registry a semantic registry for digital preservation Agenda Time Topic 09:

Unified Digital Format Registry a semantic registry for digital preservation Agenda Time Topic 09: 00 – 09. 10 Introductions and review of goals 09: 10 – 09: 30 Background on the UDFR project 09: 30 – 10: 00 Demonstration of main features 10: 00 – 10: 30 Technology stack and architecture 10: 30 – 10: 45 Break 10: 45 – 11: 45 Code walk-through 11: 45 – 12: 00 Questions and discussion 12: 00 – 13: 00 Lunch 13: 00 – 13: 45 Ontological models 13: 45 – 14: 15 Administrative procedures 14: 15 – 14: 45 Community building and next steps 14: 45 – 15: 00 Questions and discussion

Unified Digital Format Registry a semantic registry for digital preservation Goals n Understanding the

Unified Digital Format Registry a semantic registry for digital preservation Goals n Understanding the UDFR architecture n Understanding the UDFR ontological modeling n Understanding the UDFR administrative procedures n Tangible next steps for facilitating ongoing community engagement and support

Unified Digital Format Registry a semantic registry for digital preservation Agenda Time Topic 09:

Unified Digital Format Registry a semantic registry for digital preservation Agenda Time Topic 09: 00 – 09. 10 Introductions and review of goals 09: 10 – 09: 30 Background on the UDFR project 09: 30 – 10: 00 Demonstration of main features 10: 00 – 10: 30 Technology stack and architecture 10: 30 – 10: 45 Break 10: 45 – 11: 45 Code walk-through 11: 45 – 12: 00 Questions and discussion 12: 00 – 13: 00 Lunch 13: 00 – 13: 45 Ontological models 13: 45 – 14: 15 Administrative procedures 14: 15 – 14: 45 Community building and next steps 14: 45 – 15: 00 Questions and discussion

Unified Digital Format Registry a semantic registry for digital preservation Why formats? n “Format”

Unified Digital Format Registry a semantic registry for digital preservation Why formats? n “Format” is the dividing line between bits and information ffd 8 ffe 000104 a 46 49460001020100830000 ffed 0 fb 0 50686 f 746 f 73686 f 7020332 e 30003842 494 d 03 e 90 a 507269 6 e 7420496 e 666 f 00 000000780000 00480000 02 f 40240 ffee 0306025203470528 03 fc 000200000048000002 d 8 0228000100000064 0000000100030. . . SOI APP 0 APP 13 APP 2 DQT SOF 0 DRI DHT SOS ECS 0 RST 0 ECS 1 RST 1 ECS 2. . . JFIF 1. 2 IPTC ICC 183 x 512

Unified Digital Format Registry a semantic registry for digital preservation Why formats? n There

Unified Digital Format Registry a semantic registry for digital preservation Why formats? n There are many necessary preservation activities that can be usefully performed on bits qua bits n to preserve information you most act on formatted bits and know what those formats represent Preservation of content syntax and semantics (both the structure and meaning of the digital representation)

Unified Digital Format Registry a semantic registry for digital preservation Unified Digital Format Registry

Unified Digital Format Registry a semantic registry for digital preservation Unified Digital Format Registry n “A reliable, publicly accessible, and sustainable knowledge base of file format representation information for use by the digital preservation community” http: //udfr. org/ udfr-l@listserv. ucop. edu “Unification” of the function and holdings of PRONOM and GDFR http: //www. nationalarchives. gov. uk/PRONOM http: //gdfr. info/ Open source platform / GPL Semantic wiki Funded by the Library of Congress

Unified Digital Format Registry a semantic registry for digital preservation A bit of history

Unified Digital Format Registry a semantic registry for digital preservation A bit of history … n PRONOM – National Archives [UK], 2002 http: //www. nationalarchives. gov. uk/PRONOM “ready access to reliable technical information about the nature of electronic records” n JHOVE – Harvard, 2003 http: //hul. harvard. edu/jhove “digital object validation and characterization” n Global Digital Format Registry (GDFR) – Harvard/OCLC, 2006 http: //gdfr. info/ “a distributed and replicated registry of format information populated and vetted by experts and enthusiasts world-wide”

Unified Digital Format Registry a semantic registry for digital preservation A bit of history

Unified Digital Format Registry a semantic registry for digital preservation A bit of history … n Proto-UDFR – Ad hoc stakeholder community, 2009 Resolve PRONOM IPR issues and develop a community- supported open source solution Advance beyond legacy RDBMS (PRONOM) and XMLDB (GDFR) technology n UDFR – CDL, January 2011 http: //udfr. org/ udfr-l@listserv. ucop. edu “a semantic registry for digital preservation” LC/NDIIPP funded Stakeholder meeting 2011 Beta release, November 2011 Production release, May 2012

Unified Digital Format Registry a semantic registry for digital preservation Representation information n What

Unified Digital Format Registry a semantic registry for digital preservation Representation information n What you need to know about something in order to exploit that thing meaningfully [OAIS/ISO 14720] n Information that lets you answer important preservation questions (directly or indirectly) What format is it? What are its significant properties? Is it valid? Is it at risk? How can I render/play/read it? What can it be transformed into?

Unified Digital Format Registry a semantic registry for digital preservation Why semantic? n The

Unified Digital Format Registry a semantic registry for digital preservation Why semantic? n The semantic web lets anyone say anything about anything Understandable to both people and machines n The web is (or soon will be) a semantic web Linked Data interoperability http: //linkeddata. org/

Unified Digital Format Registry a semantic registry for digital preservation Why semantic? n Triples

Unified Digital Format Registry a semantic registry for digital preservation Why semantic? n Triples all the way down… Data expressed as triples Data definition (i. e. , ontology) expressed as triples Ontology definition expressed as triples n Facilitates self-configuration and easy extension

Unified Digital Format Registry a semantic registry for digital preservation Provenance n “Trust, but

Unified Digital Format Registry a semantic registry for digital preservation Provenance n “Trust, but verify” Complete change history at the assertion level ● Who made the assertion, and when ● Confidence based on institutional reputation Imprimatur of technically knowledgeable reviewers

Unified Digital Format Registry a semantic registry for digital preservation Roles n Consumer Anonymous

Unified Digital Format Registry a semantic registry for digital preservation Roles n Consumer Anonymous read n Contributor Read + write n Reviewer Read + write + review n Administrator Read + write + review + administer

Unified Digital Format Registry a semantic registry for digital preservation Initial data loads n

Unified Digital Format Registry a semantic registry for digital preservation Initial data loads n MIME types from Appspot as of 2012 -02 -22 http: //mediatypes. appspot. com/ “Routinely scrapped from IANA using code in the mediatypes Google Code project” 809 application/* 125 audio/* 39 image/* 19 message/* 14 model/* 14 multipart/* 51 text/* 56 video/* 1, 127 Plus 71 defined by PRONOM

Unified Digital Format Registry a semantic registry for digital preservation Initial data loads n

Unified Digital Format Registry a semantic registry for digital preservation Initial data loads n PRONOM as of 2012 -02 -21 http: //www. nationalarchives. gov. uk/PRONOM 846 file formats 28 character encodings 17 compression algorithms 1, 237 identifiers 1, 006 external signatures 494 internal signatures 71 MIME types (not in Appspot) 156 agents 268 software packages 2, 080 software processes 23 IPR statements 217 relationships 8, 274 n Special thanks to TNA ► ► ► Spencer Ross Tracey Powell Tim Gollins

Unified Digital Format Registry a semantic registry for digital preservation Data licensing n PRONOM

Unified Digital Format Registry a semantic registry for digital preservation Data licensing n PRONOM data contributed under UK Open Government License (OGL) http: //www. nationalarchives. gov. uk/doc/open-government-licence/ n Other submissions contributed under Creative Commons Attribution license (CC-BY) http: //creativecommons. org/licenses/by/3. 0/

Unified Digital Format Registry a semantic registry for digital preservation Communication n UDFR listserv

Unified Digital Format Registry a semantic registry for digital preservation Communication n UDFR listserv udfr-l@listserv. ucop. edu http: //listserv. ucop. edu/cgi-bin/wa. exe? A 0=UDFR-L To subscribe, send “SUB UDFR-L <name>” to listserv@ucop. edu

Unified Digital Format Registry a semantic registry for digital preservation Agenda Time Topic 09:

Unified Digital Format Registry a semantic registry for digital preservation Agenda Time Topic 09: 00 – 09. 10 Introductions and review of goals 09: 10 – 09: 30 Background on the UDFR project 09: 30 – 10: 00 Demonstration of main features 10: 00 – 10: 30 Technology stack and architecture 10: 30 – 10: 45 Break 10: 45 – 11: 45 Code walk-through 11: 45 – 12: 00 Questions and discussion 12: 00 – 13: 00 Lunch 13: 00 – 13: 45 Ontological models 13: 45 – 14: 15 Administrative procedures 14: 15 – 14: 45 Community building and next steps 14: 45 – 15: 00 Questions and discussion

Unified Digital Format Registry a semantic registry for digital preservation User’s Guide http: //udfr.

Unified Digital Format Registry a semantic registry for digital preservation User’s Guide http: //udfr. org/docs/UDFR-Users-Guide-v 1. 0. 0. pdf

Unified Digital Format Registry a semantic registry for digital preservation UI layout Workspace pane

Unified Digital Format Registry a semantic registry for digital preservation UI layout Workspace pane • Function dependent Onto. Wiki pane • Register/login/logout • SPARQL query form • Documentation • Session reset Knowledge base pane Ontology browser pane Register/login pane http: //udfr. org/

Unified Digital Format Registry a semantic registry for digital preservation Contextual menus Contextual menu

Unified Digital Format Registry a semantic registry for digital preservation Contextual menus Contextual menu http: //udfr. org/

Unified Digital Format Registry a semantic registry for digital preservation Demonstration http: //udfr. org/

Unified Digital Format Registry a semantic registry for digital preservation Demonstration http: //udfr. org/

Unified Digital Format Registry a semantic registry for digital preservation Agenda Time Topic 09:

Unified Digital Format Registry a semantic registry for digital preservation Agenda Time Topic 09: 00 – 09. 10 Introductions and review of goals 09: 10 – 09: 30 Background on the UDFR project 09: 30 – 10: 00 Demonstration of main features 10: 00 – 10: 30 Technology stack and architecture 10: 30 – 10: 45 Break 10: 45 – 11: 45 Code walk-through 11: 45 – 12: 00 Questions and discussion 12: 00 – 13: 00 Lunch 13: 00 – 13: 45 Ontological models 13: 45 – 14: 15 Administrative procedures 14: 15 – 14: 45 Community building and next steps 14: 45 – 15: 00 Questions and discussion

Unified Digital Format Registry a semantic registry for digital preservation Technology stack Apache httpd

Unified Digital Format Registry a semantic registry for digital preservation Technology stack Apache httpd http: //httpd. apache. org/ HTTP / SPARQL http: //www. w 3. org/TR/rdf-sparql-query RDFauthor/Java. Script http: //aksw. org/Projects/RDFauthor Onto. Wiki http: //ontowiki. net/ Zend framework http: //framework. zend. com/ PHP http: //www. php. net/ Noid http: //wiki. ucop. edu/display/Curation/ NOID Erfurt API http: //aksw. org/Projects/Erfurt Virtuoso quadstore http: //virtuoso. openlinksw. com/ RDF http: //www. w 3. org/RDF

Unified Digital Format Registry a semantic registry for digital preservation Onto. Wiki n Model-driven

Unified Digital Format Registry a semantic registry for digital preservation Onto. Wiki n Model-driven semantic wiki http: //ontowiki. net/ Agile Knowledge Engineering and Semantic Web research group (ASKW), Universität Leipzig http: //aksw. org/ ● DBpedia http: //www. dbpedia. org/ Key technology in EU-funded Linked Open Data (LOD 2) project http: //lod 2. eu/ Fully-featured semantic wiki facilitating user contributed content ● Modifications necessary to enforce adherence to UDFR data model and for strong provenance tracking GPL license

Unified Digital Format Registry a semantic registry for digital preservation Zend n PHP 5

Unified Digital Format Registry a semantic registry for digital preservation Zend n PHP 5 application framework http: //framework. zend. com/ Model-view-controller (MVC) architecture Web services AJAX BSD license

Unified Digital Format Registry a semantic registry for digital preservation RDFauthor n Editing system

Unified Digital Format Registry a semantic registry for digital preservation RDFauthor n Editing system for RDFa-annotated web pages http: //aksw. org/Projects/RDFauthor ► Page creation and delivery (a): Triples are embedded using RDFa with named graphs extension ► Client-side page processing (b): Embedded triples are extracted and placed into rdf. Query databanks ► Form creation (c): Based on the triples extracted, an edit form is created ► Update propagation (d): Changes are sent back to the sources via SPARQL/Update ► GPL license Note: RDFauthor, not RDFAuthor

Unified Digital Format Registry a semantic registry for digital preservation Erfurt n Zend-based semantic

Unified Digital Format Registry a semantic registry for digital preservation Erfurt n Zend-based semantic web API http: //aksw. org/Projects/Erfurt RDF storage abstraction RDF parser/serializer SPARQL 1. 1 Query/Update Versioning Caching GPL license

Unified Digital Format Registry a semantic registry for digital preservation Virtuoso n RDF quadstore

Unified Digital Format Registry a semantic registry for digital preservation Virtuoso n RDF quadstore http: //virtuoso. openlinksw. com/ SPARQL 1. 1 Named graphs Full-text indexing Inferencing Conductor administrative interface http: //docs. openlinksw. com/virtuoso/adminui. html GPL license

Unified Digital Format Registry a semantic registry for digital preservation RDF / SPARQL n

Unified Digital Format Registry a semantic registry for digital preservation RDF / SPARQL n Resource Description Framework http: //www. w 3. org/RDF/ Assertions of the form: subject predicate object udfrs: u 1 r 2473 rdfs: type udfrs: Agent. udfrs: u 1 r 2473 rdfs: label “C-Cube Microsystems”. Subjects and predicates are represented by URIs; objects, by URIs or literals Multiple serialization formats: RDF/XML, N 3, N-Triples, Turtle n SPARQL Protocol and Query Language http: //www. w 3. org/TR/rdf-sparql-query/

Unified Digital Format Registry a semantic registry for digital preservation Noid n “Nice opaque

Unified Digital Format Registry a semantic registry for digital preservation Noid n “Nice opaque identifier” minter https: //wiki. ucop. edu/display/Curation/NOID n Perl module http: //search. cpan. org/~jak/Noid-0. 424/ n Two namespaces (or “shoulders”) “u 1 f” – Formats (including character encodings and compression algorithms), e. g. ● “u 1 f 378” (JPEG/JFIF 1. 02) http: //udfr. org/udfr/u 1 f 378 “u 1 r” – All other RDF resources, e. g. ● “u 1 r 2473” (C-Cube Microsystems) http: //udfr. org/udfr/u 1 r 2473

Unified Digital Format Registry a semantic registry for digital preservation Agenda Time Topic 09:

Unified Digital Format Registry a semantic registry for digital preservation Agenda Time Topic 09: 00 – 09. 10 Introductions and review of goals 09: 10 – 09: 30 Background on the UDFR project 09: 30 – 10: 00 Demonstration of main features 10: 00 – 10: 30 Technology stack and architecture 10: 30 – 10: 45 Break 10: 45 – 11: 45 Code walk-through 11: 45 – 12: 00 Questions and discussion 12: 00 – 13: 00 Lunch 13: 00 – 13: 45 Ontological models 13: 45 – 14: 15 Administrative procedures 14: 15 – 14: 45 Community building and next steps 14: 45 – 15: 00 Questions and discussion

Unified Digital Format Registry a semantic registry for digital preservation Agenda Time Topic 09:

Unified Digital Format Registry a semantic registry for digital preservation Agenda Time Topic 09: 00 – 09. 10 Introductions and review of goals 09: 10 – 09: 30 Background on the UDFR project 09: 30 – 10: 00 Demonstration of main features 10: 00 – 10: 30 Technology stack and architecture 10: 30 – 10: 45 Break 10: 45 – 11: 45 Code walk-through 11: 45 – 12: 00 Questions and discussion 12: 00 – 13: 00 Lunch 13: 00 – 13: 45 Ontological models 13: 45 – 14: 15 Administrative procedures 14: 15 – 14: 45 Community building and next steps 14: 45 – 15: 00 Questions and discussion

Unified Digital Format Registry a semantic registry for digital preservation Code repository n All

Unified Digital Format Registry a semantic registry for digital preservation Code repository n All code (and ontologies) managed in public repositories at Git. Hub https: //github. com/UDFR Onto. Wiki https: //github. com/UDFR/Onto. Wiki Forked from https: //github. com/AKSW/Onto. Wiki Erfurt https: //github. com/UDFR/Erfurt Forked from https: //github. com/AKSW/Erfurt RDFauthor https: //github. com/UDFR/RDFauthor Forked from https: //github. com/AKSW/RDFauthor n All CDL development available under GPL license

Unified Digital Format Registry a semantic registry for digital preservation Code review n Division

Unified Digital Format Registry a semantic registry for digital preservation Code review n Division of labor New UI presentation features modify an existing Onto. Wiki view or create a new extension New UI data features RDFauthor Database queries and user/model authentication Erfurt n Norman Heino, Sebastian Dietzold, Michael Martin, and Sören Auer, “Developing semantic web applications with the Onto. Wiki Framework, ” Networked Knowledge – Networked Media 221 (Berlin: Springer, 2009), pp. 61 -77 http: //www. springerlink. com/content/742 m 6 l 6418887542/

Unified Digital Format Registry a semantic registry for digital preservation Architecture

Unified Digital Format Registry a semantic registry for digital preservation Architecture

Unified Digital Format Registry a semantic registry for digital preservation MVC recap Model •

Unified Digital Format Registry a semantic registry for digital preservation MVC recap Model • Business logic • SPARQL is here! Controller • Component • Controller's methods are Actions View • Onto. Wiki_View class • Templates run in View's context

Unified Digital Format Registry a semantic registry for digital preservation Request lifecycle index. php

Unified Digital Format Registry a semantic registry for digital preservation Request lifecycle index. php Onto. Wiki_Application Zend Framework request dispatching Render view Controller

Unified Digital Format Registry a semantic registry for digital preservation Onto. Wiki URLs n

Unified Digital Format Registry a semantic registry for digital preservation Onto. Wiki URLs n URL pattern /<controller>/<action> is automatically mapped to <action>Action() method of the <controller>Controller class (in the file <controller>Controller. php) Results display via the view in the file <action>. phtml

Unified Digital Format Registry a semantic registry for digital preservation Onto. Wiki URLs http:

Unified Digital Format Registry a semantic registry for digital preservation Onto. Wiki URLs http: //udfr. org/ontowiki/list/r/foaf: Person/p/2 http: //udfr. org/ontowiki/resource/properties/? r=http%3 A%2 F%2 Fudfr. org%2 Fudfr%2 Fu 1 r 4396 (name or Route name) Controller / Action Parameters r: http%3 A%2 F%2 Fudfr. org%2 Fudfr%2 Fu 1 r 4396

Unified Digital Format Registry a semantic registry for digital preservation Extension types n Components

Unified Digital Format Registry a semantic registry for digital preservation Extension types n Components n Modules n Plug-ins

Unified Digital Format Registry a semantic registry for digital preservation Components n MVC controllers

Unified Digital Format Registry a semantic registry for digital preservation Components n MVC controllers n Often provide view n Can serve other request class New. Controller extends Onto. Wiki_Controller_Component {. . . }

Unified Digital Format Registry a semantic registry for digital preservation Modules n Small windows

Unified Digital Format Registry a semantic registry for digital preservation Modules n Small windows n Provide additional GUI elements class New. Module extends Onto. Wiki_Module {. . . }

Unified Digital Format Registry a semantic registry for digital preservation Plug-ins n Arbitrary code

Unified Digital Format Registry a semantic registry for digital preservation Plug-ins n Arbitrary code n Register for certain events require_once 'Onto. Wiki/Plugin. php'; class New. Plugin extends Onto. Wiki_Plugin { }

Unified Digital Format Registry a semantic registry for digital preservation Plug-ins n Arbitrary code

Unified Digital Format Registry a semantic registry for digital preservation Plug-ins n Arbitrary code n Register for certain n events $event = new Erfurt_Event('on. Update. Service. Action'); $event->obj = $obj; $event->trigger();

Unified Digital Format Registry a semantic registry for digital preservation Onto. Wiki API n

Unified Digital Format Registry a semantic registry for digital preservation Onto. Wiki API n Onto. Wiki modified UI data structures Menus Toolbar Navigation

Unified Digital Format Registry a semantic registry for digital preservation Menus n Onto. Wiki_Menu

Unified Digital Format Registry a semantic registry for digital preservation Menus n Onto. Wiki_Menu set. Entry : : (. . . ); n Entries may provide links, or separators n Window menu n Context menu n JSON serialization

Unified Digital Format Registry a semantic registry for digital preservation Toolbar n Onto. Wiki_Toolbar

Unified Digital Format Registry a semantic registry for digital preservation Toolbar n Onto. Wiki_Toolbar n Default Buttons: Submit, Cancel, Edit, Add, … n UDFR button: Review Onto. Wiki_Toolbar: : append. Button( Onto. Wiki_Toolbar: : SUBMIT, array('name' => 'Review', 'id' => 'resource-review') );

Unified Digital Format Registry a semantic registry for digital preservation Navigation n Displayed as

Unified Digital Format Registry a semantic registry for digital preservation Navigation n Displayed as a tab bar in the upper part of the main window n Components can register with Navigation n Can be registered: Onto. Wiki_Navigation: : register('history', array( ‘controller' => 'history', // history controller 'action' => 'list', // list action 'name' => 'History', 'priority' => 30) );

Unified Digital Format Registry a semantic registry for digital preservation Messages n Any window

Unified Digital Format Registry a semantic registry for digital preservation Messages n Any window can have a message n Application keeps message stack displayed automatically in main view n Message types: success, warning, info, error Onto. Wiki_Application: : append. Message( new Onto. Wiki_Message( 'No statement was selected. Please select statement(s) for review', Onto. Wiki_Message: : ERROR) );

Unified Digital Format Registry a semantic registry for digital preservation Themes n CSS, Java.

Unified Digital Format Registry a semantic registry for digital preservation Themes n CSS, Java. Script, images, templates n Allow to modify way Onto. Wiki displays things n Behavior & look applied to CSS classes

Unified Digital Format Registry a semantic registry for digital preservation CSS Framework n Uses

Unified Digital Format Registry a semantic registry for digital preservation CSS Framework n Uses generic classes Windows Drop-down & context menus Tabbed content Message boxes Tables, lists

Unified Digital Format Registry a semantic registry for digital preservation RDFa widgets n Structured

Unified Digital Format Registry a semantic registry for digital preservation RDFa widgets n Structured data is available in rendered HTML code n Editing widgets based on extracted statements n Can probably work on more than one statement

Unified Digital Format Registry a semantic registry for digital preservation Code review n UC

Unified Digital Format Registry a semantic registry for digital preservation Code review n UC 3 modifications in three key areas Instance creation Review User profile

Unified Digital Format Registry a semantic registry for digital preservation Agenda Time Topic 09:

Unified Digital Format Registry a semantic registry for digital preservation Agenda Time Topic 09: 00 – 09. 10 Introductions and review of goals 09: 10 – 09: 30 Background on the UDFR project 09: 30 – 10: 00 Demonstration of main features 10: 00 – 10: 30 Technology stack and architecture 10: 30 – 10: 45 Break 10: 45 – 11: 45 Code walk-through 11: 45 – 12: 00 Questions and discussion 12: 00 – 13: 00 Lunch 13: 00 – 13: 45 Ontological models 13: 45 – 14: 15 Administrative procedures 14: 15 – 14: 45 Community building and next steps 14: 45 – 15: 00 Questions and discussion

Unified Digital Format Registry a semantic registry for digital preservation Questions and discussion

Unified Digital Format Registry a semantic registry for digital preservation Questions and discussion

Unified Digital Format Registry a semantic registry for digital preservation Agenda Time Topic 09:

Unified Digital Format Registry a semantic registry for digital preservation Agenda Time Topic 09: 00 – 09. 10 Introductions and review of goals 09: 10 – 09: 30 Background on the UDFR project 09: 30 – 10: 00 Demonstration of main features 10: 00 – 10: 30 Technology stack and architecture 10: 30 – 10: 45 Break 10: 45 – 11: 45 Code walk-through 11: 45 – 12: 00 Questions and discussion 12: 00 – 13: 00 Lunch 13: 00 – 13: 45 Ontological models 13: 45 – 14: 15 Administrative procedures 14: 15 – 14: 45 Community building and next steps 14: 45 – 15: 00 Questions and discussion

Unified Digital Format Registry a semantic registry for digital preservation Agenda Time Topic 09:

Unified Digital Format Registry a semantic registry for digital preservation Agenda Time Topic 09: 00 – 09. 10 Introductions and review of goals 09: 10 – 09: 30 Background on the UDFR project 09: 30 – 10: 00 Demonstration of main features 10: 00 – 10: 30 Technology stack and architecture 10: 30 – 10: 45 Break 10: 45 – 11: 45 Code walk-through 11: 45 – 12: 00 Questions and discussion 12: 00 – 13: 00 Lunch 13: 00 – 13: 45 Ontological models 13: 45 – 14: 15 Administrative procedures 14: 15 – 14: 45 Community building and next steps 14: 45 – 15: 00 Questions and discussion

Unified Digital Format Registry a semantic registry for digital preservation Ontological Models n Overview

Unified Digital Format Registry a semantic registry for digital preservation Ontological Models n Overview Purpose Model documentation Ontology repositories n Design decisions Naming conventions, identifiers, URI construction Design patterns Additional integration

Unified Digital Format Registry a semantic registry for digital preservation Ontological Models Source: http:

Unified Digital Format Registry a semantic registry for digital preservation Ontological Models Source: http: //programmerryangosling. tumblr. com/post/14727789533

Unified Digital Format Registry a semantic registry for digital preservation Model Overview n System

Unified Digital Format Registry a semantic registry for digital preservation Model Overview n System configuration and administration Defines actions, roles, access control n Profile Allows anonymous read-only access to public profile for provenance purposes n UDFRS/ UDFR Defines core schema and data for registered ob jects n Imported external models Enable semantic relationships, e. g. , RDFS, OWL, SKOS Define descriptions, e. g. , DC, Dcterms Integrate vocabularies, e. g. , MADSRDF, MIME

Unified Digital Format Registry a semantic registry for digital preservation Ontowiki Config Ontologies n

Unified Digital Format Registry a semantic registry for digital preservation Ontowiki Config Ontologies n Onto. Wiki system ontology (Sys. Ont) This schema model provides the vocabulary for configuration (e. g. terms for access control). Uses FOAF/SIOC for some profile terms Defined by AKSW. Used for core functionality, should not be modified n Onto. Wiki system configuration (Config) Imports Sys. Ont schema model Used to configure model based access control (role administration) Also used when creating new actions and mapping actions to roles

Unified Digital Format Registry a semantic registry for digital preservation Configuration Concepts n User,

Unified Digital Format Registry a semantic registry for digital preservation Configuration Concepts n User, includes special: Anonymous (not logged in) Super. Admin (uses db login/pw; ignores all access control config) n Usergroup User can be member of 1+ groups All rights/restrictions of group are applied to User n Model, includes special: sysont: Any. Model (any available model) n Action Application-specific function or a group of functions identified by a URI Developers can create new action which represents plugin capabilities Used to manage special rights Includes special: sysont: Any. Action (any available action)

Unified Digital Format Registry a semantic registry for digital preservation Access Control readable model

Unified Digital Format Registry a semantic registry for digital preservation Access Control readable model not readable model editable model not editable model Action User Model Usergroup File member grant access deny access to. Model Ordering 1. Collect all granted models from User / Usergroup 2. Collect all denied models from User / Usergroup and subtract from grant list Deny Statements override Grant Statements

Unified Digital Format Registry a semantic registry for digital preservation Configuration example: Review Action:

Unified Digital Format Registry a semantic registry for digital preservation Configuration example: Review Action: Reviewer Role:

Unified Digital Format Registry a semantic registry for digital preservation UDFR profile n Contains

Unified Digital Format Registry a semantic registry for digital preservation UDFR profile n Contains additional provenance information of users and data sources n Kept distinct from account information in Configuration model in order to display some attributes publicly n Key properties Title Display name Real name Organizational affiliation Website Additional notes

Unified Digital Format Registry a semantic registry for digital preservation Profile example: Person: Data

Unified Digital Format Registry a semantic registry for digital preservation Profile example: Person: Data Source:

Unified Digital Format Registry a semantic registry for digital preservation UDFR schema n Superset

Unified Digital Format Registry a semantic registry for digital preservation UDFR schema n Superset of PRONOM 7 and GDFR n Statistics: 5326 triples (2566 local, 2727 imported, 33 inferred) 113 classes (105 local, 8 imported) 159 properties (121 local, 38 imported) n Controlled Vocabulary classes: 38 n Imported ontologies RDF, RDFS, OWL – foundational http: //www. w 3. org/1999/02/22 -rdf-syntax-ns# http: //www. w 3. org/2000/01/rdf-schema# http: //www. w 3. org/2002/07/owl#

Unified Digital Format Registry a semantic registry for digital preservation UDFR schema n Imported

Unified Digital Format Registry a semantic registry for digital preservation UDFR schema n Imported ontologies FOAF, SIOC – Onto. Wiki foundational http: //xmlns. com/foaf/ http: //rdfs. org/sioc/ns# SKOS – controlled vocabularies http: //www. w 3. org/2008/05/skos# LOCMADS – imported LC-controlled vocabularies http: //id. loc. gov/vocabulary/iso 639 -2/ MIME – MIME types http: //purl. org/NET/mediatypes/

Unified Digital Format Registry a semantic registry for digital preservation Code repository Source: http:

Unified Digital Format Registry a semantic registry for digital preservation Code repository Source: http: //programmerryangosling. tumblr. com/post/14710787186

Unified Digital Format Registry a semantic registry for digital preservation Code repository n All

Unified Digital Format Registry a semantic registry for digital preservation Code repository n All ontologies (and code) managed in public repositories at Git. Hub https: //github. com/UDFR Ontologies https: //github. com/UDFR-Models ● udfrs [onto. owl] UDFR schema ● udfr [udfr. owl] UDFR instance data http: //udfr. org/onto# http: //udfr. org/udfr/ ● profile[profile. owl] UDFR user profiles http: //udfr. org/profile/

Unified Digital Format Registry a semantic registry for digital preservation Code repository n There

Unified Digital Format Registry a semantic registry for digital preservation Code repository n There also Onto. Wiki system configuration schemata (only visible to administrators) (sysont/sysconf) System Ontology ● Sys. Ont. rdf from Erfurt include directory upon install System Configuration http: //localhost/Onto. Wiki/Config/

Unified Digital Format Registry a semantic registry for digital preservation Naming conventions n Classes

Unified Digital Format Registry a semantic registry for digital preservation Naming conventions n Classes Upper. Camel. Case for URIs Title. Case for labels n Individuals UDFR identifiers for URIs Data source conventions for labels n Properties lower. Camel. Case for URIs Title. Case for labels

Unified Digital Format Registry a semantic registry for digital preservation Identifiers n UDFR identifier

Unified Digital Format Registry a semantic registry for digital preservation Identifiers n UDFR identifier scheme u 1 f (file formats, compression algorithms, encodings) u 1 r (everything else) n UDFR Local Identifier String property Maps entity to string for easy lookup and use n Alias Identifiers Map to resource within UDFR with: ● Namespace property (e. g. , PUID) ● Identifier string value

Unified Digital Format Registry a semantic registry for digital preservation URI Construction n Schema

Unified Digital Format Registry a semantic registry for digital preservation URI Construction n Schema uses “hash” for ease of publishing http: //udfr. org/onto# n Instance data uses “slash” for ease for retrieval http: //udfr. org/udfr/

Unified Digital Format Registry a semantic registry for digital preservation Design patterns n Abstract

Unified Digital Format Registry a semantic registry for digital preservation Design patterns n Abstract Classes n Controlled Vocabularies as closed enumeration classes / SKOS concepts n Integration with other ontologies To enable semantic relationships (RDFS, OWL, SKOS) To define descriptions (DC, DCTerms) To integrate vocabularies (MADSRDF, MIME) Implemented by: ● Importing ontologies ● Mapping via sub. Class and sub. Property relations

Unified Digital Format Registry a semantic registry for digital preservation Integration with PRONOM n

Unified Digital Format Registry a semantic registry for digital preservation Integration with PRONOM n Worked closely with UK National Archives (TNA) in ontology creation to keep joint development aligned n Potentially use owl: equivalent. Class to map. However, membership of class extensions may vary Alternatively, rdfs: sub. Class. Of Similar approach for properties n Define alias identifier statements in UDFR

Unified Digital Format Registry a semantic registry for digital preservation UDFR schema Source: http:

Unified Digital Format Registry a semantic registry for digital preservation UDFR schema Source: http: //programmerryangosling. tumblr. com/post/17532370461

Unified Digital Format Registry a semantic registry for digital preservation UDFR schema Abstract Base

Unified Digital Format Registry a semantic registry for digital preservation UDFR schema Abstract Base … Controlled Vocabulary holder Process IPR embodies Software owner Agent ipr Hardware dependency creator Abstract Product Grammar Media assessment Character Encoding grammar Holding reference file specification Abstract Format Document signature File Format Abstract Signature Digest maintainer input / output Assessment product Compression Algorithm File digest External Signature Internal Signature

Unified Digital Format Registry a semantic registry for digital preservation UDFR schema udfrs: Abstract.

Unified Digital Format Registry a semantic registry for digital preservation UDFR schema udfrs: Abstract. Base Obligation Type Cardinality rdfs: label Required xsd: string Singleton udfrs: alias. Identifier Optional udfrs: Identifier Repeatable udfrs: alias. Name Optional xsd: string Repeatable udfrs: description Optional xsd: string Repeatable udfrs: note Optional xsd: string Repeatable udfrs: status. Type Optional udfrs: Status. Type Singleton udfrs: udfr. Identifier Required udfrs: Identifer Singleton

Unified Digital Format Registry a semantic registry for digital preservation UDFR schema udfrs: Abstract.

Unified Digital Format Registry a semantic registry for digital preservation UDFR schema udfrs: Abstract. Product Obligation Type Cardinality udfrs: availability. Type Optional udfrs: Availability. Type Singleton udfrs: creation. Date Optional xsd: string Singleton udfrs: dependency Optional udfrs: Abstract. Product Repeatable udfrs: disclosure. Type Optional udfrs: Disclosure. Type Singleton udfrs: documentation Optional udfrs: Document Repeatable udfrs: file Optional udfrs: File Repeatable udfrs: ipr Optional udfrs: IPR Repeatable udfrs: maintainer Optional udfrs: Agent Repeatable udfrs: owner Optional udfrs: Agent Repeatable udfrs: previous. Version Optional udfrs: Abstract. Product Repeatable udfrs: release. Date Optional xsd: string Singleton udfrs: version Optional xsd: string Singleton udfrs: withdrawl. Date Optional xsd: string Singleton

Unified Digital Format Registry a semantic registry for digital preservation UDFR schema udfrs: Abstract.

Unified Digital Format Registry a semantic registry for digital preservation UDFR schema udfrs: Abstract. Format Obligation Type Cardinlaity udfrs: domain. Facet. Type Optional udfrs: Domain. Facet. Type Repeatable udfrs: form. Type Optional udfrs: Form. Type Singleton udfrs: format. Assessment Optional udfrs: Assessment Repeatable udfrs: genre. Facet. Type Optional udfrs: Genre. Facet. Type Repeatable udfrs: has. Affinity. For Optional udfrs: Abstract. Format Repeatable udfrs: is. Defined. By Optional udfrs: Abstract. Format Repeatable udfrs: is. Subtype. Of Optional udfrs: Abstract. Format Repeatable udfrs: may. Contain Optional udfrs: Abstract. Format Repeatable udfrs: mime. Type Optional udfrs: MIME Repeatable udfrs: related. Format Optional udfrs: Abstract. Format Repeatable udfrs: role. Facet. Type Optional udfrs: Role. Facet. Type Singleton udfrs: signature Optional udfrs: Abstract. Signature Repeatable udfrs: subsidiary. Genre. Facet. Type Optional udfrs: Genre. Facet. Type Repeatable udfrs: transform. Type Optional udfrs: Transform. Type Repeatable

Unified Digital Format Registry a semantic registry for digital preservation UDFR schema udfrs: File.

Unified Digital Format Registry a semantic registry for digital preservation UDFR schema udfrs: File. Format Obligation Type Cardinality — — udfrs: Encoding Obligation Type Cardinality — — udfrs: Compression Obligation Type Cardinality udfrs: lossiness. Type Optional udfrs: Lossiness. Type Singleton

Unified Digital Format Registry a semantic registry for digital preservation UDFR schema n Online

Unified Digital Format Registry a semantic registry for digital preservation UDFR schema n Online documentation http: //udfr. org/docs/onto

Unified Digital Format Registry a semantic registry for digital preservation Agenda Time Topic 09:

Unified Digital Format Registry a semantic registry for digital preservation Agenda Time Topic 09: 00 – 09. 10 Introductions and review of goals 09: 10 – 09: 30 Background on the UDFR project 09: 30 – 10: 00 Demonstration of main features 10: 00 – 10: 30 Technology stack and architecture 10: 30 – 10: 45 Break 10: 45 – 11: 45 Code walk-through 11: 45 – 12: 00 Questions and discussion 12: 00 – 13: 00 Lunch 13: 00 – 13: 45 Ontological models 13: 45 – 14: 15 Administrative procedures 14: 15 – 14: 45 Community building and next steps 14: 45 – 15: 00 Questions and discussion

Unified Digital Format Registry a semantic registry for digital preservation Listing all users n

Unified Digital Format Registry a semantic registry for digital preservation Listing all users n Login with administrative privileges n Select the “Onto. Wiki System Configuration” knowledge base n Select the “User” class to list all users

Unified Digital Format Registry a semantic registry for digital preservation Listing user profile information

Unified Digital Format Registry a semantic registry for digital preservation Listing user profile information n n Login with administrative privileges Select the “http: //udfr. org/profile” knowledge base Select the “Account profile” class to list all users Select the user

Unified Digital Format Registry a semantic registry for digital preservation Listing user profile information

Unified Digital Format Registry a semantic registry for digital preservation Listing user profile information n n Login with administrative privileges Select the “http: //udfr. org/profile” knowledge base Select the “Account profile” class to list all users Select the user n Note: group membership is shown as a property of the “User” in the “Onto. Wiki System Configuration” knowledge base

Unified Digital Format Registry a semantic registry for digital preservation Listing user group membership

Unified Digital Format Registry a semantic registry for digital preservation Listing user group membership n n Login with administrative privileges Select the “Onto. Wiki System Configuration” knowledge base Select the “User” class to list all users Select the user

Unified Digital Format Registry a semantic registry for digital preservation Listing user group membership

Unified Digital Format Registry a semantic registry for digital preservation Listing user group membership n n Login with administrative privileges Select the “Onto. Wiki System Configuration” knowledge base Select the “User” class to list all users Select the user

Unified Digital Format Registry a semantic registry for digital preservation Setting user privileges n

Unified Digital Format Registry a semantic registry for digital preservation Setting user privileges n n Login with administrative privileges Select the “Onto. Wiki System Configuration” knowledge base Select the “Usergroup” class to list all groups Select “Edit Resource” in the menu for the desired group

Unified Digital Format Registry a semantic registry for digital preservation Setting user privileges n

Unified Digital Format Registry a semantic registry for digital preservation Setting user privileges n Add or delete the user as a member User URIs are of the form” http: //localhost/Onto. Wiki/Config/<user>

Unified Digital Format Registry a semantic registry for digital preservation Reset the Noid counters

Unified Digital Format Registry a semantic registry for digital preservation Reset the Noid counters n The Noid minter installation looks like: /udfr/apps/ontowiki/minters/ u 1 f/ 0=minter_1. 00 minter. bdb minter. lock minter. log minter. README u 1 r/ 0=minter_1. 00 minter. bdb minter. lock minter. log minter. README noid/ noid* README. . . udfrnoid. csh*

Unified Digital Format Registry a semantic registry for digital preservation Reset the Noid counters

Unified Digital Format Registry a semantic registry for digital preservation Reset the Noid counters n Login with role privileges n Delete or rename the “minters” directory n Run the shell script “udfrnoid. csh” % % sudo su - udfr cd /home/udfr/apps/ontowiki rm –fr minters # or mv minters-bak csh –f udfrnoid. csh init

Unified Digital Format Registry a semantic registry for digital preservation Bulk import n Create

Unified Digital Format Registry a semantic registry for digital preservation Bulk import n Create a “Data source” user Login with administrative privileges Select “User > Register New User” in the Onto. Wiki pane

Unified Digital Format Registry a semantic registry for digital preservation Bulk import n Express

Unified Digital Format Registry a semantic registry for digital preservation Bulk import n Express the RDF assertions in N-Triples http: //www. w 3. org/2001/sw/RDFCore/ntriples/ If adding new resources, place the “rdfs: type” assertions first udfr: u 1 f 46 rdf: type udfrs: File. Format. udfr: u 1 f 46 udfrs: udfr. Identifier “u 1 f 46”. udfr: u 1 f 46 rdfs: label “Broadcast WAVE, version 0”. . Use Noid to mint identifiers in the “u 1 f” and “u 1 r” shoulders for resource : <shoulder><id> % cd /udfr/apps/ontowiki/noid %. /noid <shoulder>. mint 1 Use the identifiers to construct resource URIs in the “udfr” namespace: http: //udfr. org/udfr/<shoulder>/<id> This may be a multi-stage process if there are relationships between resources

Unified Digital Format Registry a semantic registry for digital preservation Bulk import n Submit

Unified Digital Format Registry a semantic registry for digital preservation Bulk import n Submit to Virtuoso using SPARQL Update % curl --verbose --user <user>: <password> --data-urlencode query@<file>. nt http: //udfr. cdlib. org: 8089/update

Unified Digital Format Registry a semantic registry for digital preservation Modify an ontology n

Unified Digital Format Registry a semantic registry for digital preservation Modify an ontology n Modify the ontology using an external ontology editor E. g. , Top. Braid Composer (TBC) http: //www. topquadrant. com/products/TB_Composer. html n Login with administrative privileges n Make sure there is a clean backup n Select the “Delete Knowledge Base” menu option for the relevant knowledge base

Unified Digital Format Registry a semantic registry for digital preservation Modify an ontology n

Unified Digital Format Registry a semantic registry for digital preservation Modify an ontology n Select the “Edit > Create Knowledge Base” menu option in the “Select Knowledge Base” pane

Unified Digital Format Registry a semantic registry for digital preservation Modify an ontology n

Unified Digital Format Registry a semantic registry for digital preservation Modify an ontology n Specify the base URI n Select the “Upload a file” radio button n Select the file type

Unified Digital Format Registry a semantic registry for digital preservation Modify an ontology n

Unified Digital Format Registry a semantic registry for digital preservation Modify an ontology n Browse to the local ontology file and upload

Unified Digital Format Registry a semantic registry for digital preservation Backup n Weekly full,

Unified Digital Format Registry a semantic registry for digital preservation Backup n Weekly full, and nightly incremental, backups of RDF and history/provenance Virtuoso interactive SQL utility (ISQL) http: //docs. openlinksw. com/virtuoso/backup. html Listening on localhost: 1111 % sudo su - udfr % cd /udfr/apps/virtuoso-opensource-version/bin %. /isql 1111 <user> <passwd> SQL> backup_context_clear(); # leave out for nightly SQL> checkpoint; # leave out for nightly SQL> backup_online(‘virt-inc_dump_#’, 500, 0, vector(<directory>)); SQL> exit;

Unified Digital Format Registry a semantic registry for digital preservation Restore n n Shutdown

Unified Digital Format Registry a semantic registry for digital preservation Restore n n Shutdown Virtuoso Delete (or rename) Virtuoso database file Restart Virtuoso Replay transaction file(s) % % % sudo su – udfr cd /udfr/apps/virtuoso-opensource-version/var/lib/virtuoso/db rm –f virtuoso. db cd /udfr/apps/virtuoso-opensource-version/bin. /virtuoso-t –c. . /var/lib/virtuoso/ontowiki/virtuoso. ini +restore-backup virt-inc_dump_# %. /isql 1111 <user> <passwd> SQL> replay(‘<transaction-file-1>’); # specify files in temporal order SQL> replay(‘<transaction-file-2>’); SQL>. . . SQL> exit;

Unified Digital Format Registry a semantic registry for digital preservation Agenda Time Topic 09:

Unified Digital Format Registry a semantic registry for digital preservation Agenda Time Topic 09: 00 – 09. 10 Introductions and review of goals 09: 10 – 09: 30 Background on the UDFR project 09: 30 – 10: 00 Demonstration of main features 10: 00 – 10: 30 Technology stack and architecture 10: 30 – 10: 45 Break 10: 45 – 11: 45 Code walk-through 11: 45 – 12: 00 Questions and discussion 12: 00 – 13: 00 Lunch 13: 00 – 13: 45 Ontological models 13: 45 – 14: 15 Administrative procedures 14: 15 – 14: 45 Community building and next steps 14: 45 – 15: 00 Questions and discussion

Unified Digital Format Registry a semantic registry for digital preservation To do n Peer-to-peer

Unified Digital Format Registry a semantic registry for digital preservation To do n Peer-to-peer replication n Import additional data sources Library of Congress Sustainability of Digital Formats http: //www. digitalpreservation. gov/formats/ Other candidates? n Recruit reviewers n Permanent operational home n Sustainable community governance and development/ maintanence structure

Unified Digital Format Registry a semantic registry for digital preservation Agenda Time Topic 09:

Unified Digital Format Registry a semantic registry for digital preservation Agenda Time Topic 09: 00 – 09. 10 Introductions and review of goals 09: 10 – 09: 30 Background on the UDFR project 09: 30 – 10: 00 Demonstration of main features 10: 00 – 10: 30 Technology stack and architecture 10: 30 – 10: 45 Break 10: 45 – 11: 45 Code walk-through 11: 45 – 12: 00 Questions and discussion 12: 00 – 13: 00 Lunch 13: 00 – 13: 45 Ontological models 13: 45 – 14: 15 Administrative procedures 14: 15 – 14: 45 Community building and next steps 14: 45 – 15: 00 Questions and discussion

Unified Digital Format Registry a semantic registry for digital preservation Questions and discussion

Unified Digital Format Registry a semantic registry for digital preservation Questions and discussion

Unified Digital Format Registry a semantic registry for digital preservation For more information n

Unified Digital Format Registry a semantic registry for digital preservation For more information n UDFR http: //udfr. org/ http: //bitbucket. org/udfr http: //github. com/UDFR udfr-l@listserv. ucop. edu n Onto. Wiki http: //ontowiki. net/Projects/Onto. Wiki n Erfurt http: //aksw. org/Projects/Erfurt n AKSW, Universität Leipzig http: //aksw. org/ Philipp Frischmuth Sebastian Tramp Norman Heino n Library of Congress http: //www. digitalpreservation. gov Martha Anderson Leslie Johnston n UC Curation Center n RDFauthor http: //www. cdlib. org/uc 3@ucop. edu n Zend Stephen Abrams Patricia Cruse Margaret Low Abhishek Salve http: //aksw. org/Projects/RDFauthor http: //framework. zend. com/ n Virtuoso http: //www. openlinksw. com/dataspace/dav/wiki/Main/VOSRDFWP Lisa Dawn Colvin John Kunze Mark Reyes Marisa Strong