Digital Archive Policies and Trusted Digital Repositories Mac

  • Slides: 22
Download presentation
Digital Archive Policies and Trusted Digital Repositories Mac. Kenzie Smith, MIT Libraries Reagan Moore,

Digital Archive Policies and Trusted Digital Repositories Mac. Kenzie Smith, MIT Libraries Reagan Moore, San Diego Supercomputer Center DCC Conference, Glasgow November, 2006 1

What is the Problem? n Need to extract local collection management policies from software

What is the Problem? n Need to extract local collection management policies from software to be more discoverable, configurable n Need to standardize ILM policies for sharing across systems within a preservation environment n Need to define metadata to audit ILM operations and achieve trust in a scalable, automated way DCC Conference, Glasgow November, 2006 2

DCC Conference, Glasgow November, 2006 3

DCC Conference, Glasgow November, 2006 3

Local Repository Policy/Rule Types Enterprise specification of assertions Archive a-periodic, deferred consistency rules Collection

Local Repository Policy/Rule Types Enterprise specification of assertions Archive a-periodic, deferred consistency rules Collection periodic rules Item periodic or atomic rules DCC Conference, Glasgow November, 2006 4

Policy Framework n Based on the NARA/RLG TDR checklist categories: n Organization, environment and

Policy Framework n Based on the NARA/RLG TDR checklist categories: n Organization, environment and legal policies n Community and usability policies n Process and Procedure policies n Technology and Infrastructure policies DCC Conference, Glasgow November, 2006 5

Policy Framework n Abstract policy (high-level) Example: repository stipulates the number and location of

Policy Framework n Abstract policy (high-level) Example: repository stipulates the number and location of copies of all digital objects. Number of copies to be made, and which specific location(s), business rules, preferences for order of replication use. Repository has mechanisms in place to insure any/multiple copies of digital objects are synchronized. DCC Conference, Glasgow November, 2006 6

Policy Framework n Concrete policy (local policy and metadata) Example: n n Specific number

Policy Framework n Concrete policy (local policy and metadata) Example: n n Specific number of copies of digital objects Locations of copies of digital objects Order of preference for digital object copies Location of business rules for copies (e. g. contract with 3 rd party archives for remote copies) DCC Conference, Glasgow November, 2006 7

Policy Encoding n Looked at lots of schemas and approaches n XACML and Rule.

Policy Encoding n Looked at lots of schemas and approaches n XACML and Rule. ML, BPEL too limited n n Ponder and KAo. S too risky n n Single purpose (access control, rights management, workflow, etc. ) Research projects that are no longer active Using Rei (N 3) RDF ontology DCC Conference, Glasgow November, 2006 8

Policy Exchange n DSpace DIPs n n n based on METS (also looked at

Policy Exchange n DSpace DIPs n n n based on METS (also looked at XFDU, IMS CP, others) encapsulates content files, metadata, provenance, and policies i. RODS n n enforces policies based on local rules produces state information (metadata) that can be audited by the DSpace repository over time DCC Conference, Glasgow November, 2006 9

Example Functional Requirements The ERA list defines 854 key capabilities (functional requirements) needed for

Example Functional Requirements The ERA list defines 854 key capabilities (functional requirements) needed for preservation. These can be loosely organized into categories related to: n n n n Management of disposition agreements describing record retention and disposition actions Accession, the formal acceptance of records into the data management system Arrangement, the organization of the records to preserve a required structure (implemented as a collection/sub-collection hierarchy) Description, the management of descriptive metadata as well as text indexing Preservation, the generation of Archival Information Packages Access, the generation of Dissemination Information Packages Subscription, the specification of services that a user picks for execution Notification, the delivery of notices on service execution results Queuing of large scale tasks through interaction with workflow systems System performance and failure reports. Of particular interest is the identification of all failures within the data management system and the recovery procedures that were invoked. Transformative migration, the ability to convert specified data formats to new standards. In this case, each new encoding format is managed as a version of the original record. Display transformation, the ability to reformat a file for presentation. Automated client specification, the ability to pick the appropriate client for each user. DCC Conference, Glasgow November, 2006 10

Rule Definition n Based on assessment criteria / preservation policies / preservation functional capabilities

Rule Definition n Based on assessment criteria / preservation policies / preservation functional capabilities n Implemented as n Rules controlling micro-services with associated persistent state information DCC Conference, Glasgow November, 2006 11

Case Study n DSpace@MIT institutional repository n n n Defines local collection management policies

Case Study n DSpace@MIT institutional repository n n n Defines local collection management policies Consumes 3 rd party preservation services (e. g. i. RODS) Provides provenance/audit (History) to monitor trust DCC Conference, Glasgow n SRB/i. RODS virtualized storage environment n n n November, 2006 Provides 3 rd party preservation services Rules derived from local policy, preservation requirements Provides metadata to allow monitoring for trust 12

DSpace Event System n Archivist defines TDR-level abstract policies, System curator defines ILM events

DSpace Event System n Archivist defines TDR-level abstract policies, System curator defines ILM events of interest, based on policies n n n e. g. ingest, modification, preservation migration, new edition, change in access rules, etc. System detects and acts on events, records them in the local History (provenance audit) n e. g. i. RODS deposit n History/provenance uses ABC Harmony ontology for ILM (RDF) System curator monitors n n i. RODS state information DSpace History subsystem (via standard RDF browsing tools) DCC Conference, Glasgow November, 2006 13

i. RODS Rule-based System Quantify the management policies n Automate the application of the

i. RODS Rule-based System Quantify the management policies n Automate the application of the policies n Track the outcomes from application of the policies n n First release of the software is this month DCC Conference, Glasgow November, 2006 14

i. RODS - Infrastructure Independence n Six logical name spaces required to manage preservation

i. RODS - Infrastructure Independence n Six logical name spaces required to manage preservation properties n n n Records Persons Storage resources Rules Micro-services Persistent state information DCC Conference, Glasgow November, 2006 15

Example Archivist Policies n Authenticity n n n Are required provenance metadata provided with

Example Archivist Policies n Authenticity n n n Are required provenance metadata provided with record? Submission requirement Is the chain of custody properly documented? - Management requirement Integrity n n Are the bits protected against natural disasters? Management requirement for replication and distribution Are the bits preserved without corruption? - Future assertion DCC Conference, Glasgow November, 2006 16

Example Archivist Policies n Infrastructure independence n Management of preservation properties independently of choice

Example Archivist Policies n Infrastructure independence n Management of preservation properties independently of choice of hardware and software infrastructure Management policies are needed for assertions about the properties of the records (authenticity and integrity) and the properties of the preservation environment (infrastructure independence) DCC Conference, Glasgow November, 2006 17

Example of Complete Process of Rule Derivation from Preservation Criteria n Assessment Criteria n

Example of Complete Process of Rule Derivation from Preservation Criteria n Assessment Criteria n n Management policy n n Integrity of records is preserved Integrity will be verified every 6 months Preservation capabilities n n Replication of records Checksum on each record Synchronization between replicas Federation between archives DCC Conference, Glasgow November, 2006 18

Rule-based Preservation Policies n Generated Rules n Event-condition-(set of micro-service or other rules) n

Rule-based Preservation Policies n Generated Rules n Event-condition-(set of micro-service or other rules) n Each micro-service corresponds to operations on a record at a remote storage location n Each micro-service has a recovery procedure to handle remote system failure or unavailability n Persistent state information is saved to track the outcome from applying the rule DCC Conference, Glasgow November, 2006 19

Rule Example Validate Record Integrity n Check permissions (requires archivist or proxy) n Operations

Rule Example Validate Record Integrity n Check permissions (requires archivist or proxy) n Operations on specified record n n n Access remote site Compute the checksum and compare with archived value If checksum is not correct n n Access a replica, compute checksum, and verify is correct Replace bad replica with a good replica Update audit list to track the replacement Update persistent state to record date of checksum verification DCC Conference, Glasgow November, 2006 20

Additional Implied Assessment Criteria n Are there any orphaned records present in the archive

Additional Implied Assessment Criteria n Are there any orphaned records present in the archive with no preservation metadata? n Are the replicas distributed across independent administrative domains on different types of storage systems? n Is the observed error rate a factor of four lower than the validation rate? n Have all records been validated within the required time period? DCC Conference, Glasgow November, 2006 21

Self-consistency and Closure n For every required preservation attribute (authenticity and integrity) are their

Self-consistency and Closure n For every required preservation attribute (authenticity and integrity) are their assessment criteria? n For every assessment criterion, does there exist preservation metadata? n Are the properties of the preservation environment also preserved? DCC Conference, Glasgow November, 2006 22