SCAPE Planning and Watch Review presentation Christoph Becker
SCAPE Planning and Watch Review presentation Christoph Becker Vienna University of Technology www. ifs. tuwien. ac. at/~becker SCAPE First year project review, Luxembourg March 20 -21, 2012
SCAPE Outline • Objectives and overall progress • Key results • Watch design (D 12. 1) • Decision factors analysis (D 14. 1) Ø Sneak preview: The knowledge browser • Integration and outlook • Time for questions 2
SCAPE Preservation Planning: Key concepts Repeatable, standardized planning workflow § A weighted hierarchy of objectives § Measurable criteria on the leaf level of the tree § Utility functions make criteria comparable § § Controlled experimentation on sample content § § Evidence-based decision making Standardized structure for plan specification Transparency and documentation § Comparability across scenarios § § Planning tool Plato guides, validates, documents
Scalability Challenges SCAPE § Creating a plan is effort-intensive § Sharing experience is difficult § Monitoring changes is manual § Integrating context, strategies and operations is difficult
Scalability Challenges § Creating a plan is effort-intensive Ø § Increase standardisation and reusability Monitoring changes is manual Ø § Increase efficiency of planning Sharing experience is difficult Ø § SCAPE Introduce automation Integrating context, strategies and operations is difficult Manage policies Ø Integrate systems Ø
Work packages and major goals SCAPE • PW. WP. 1 (WP 12): Automated Watch • Watch component for monitoring aspects of interest • Simulation component for prediction • PW. WP. 2 (WP 13): Policies Representation • Catalogue of high-level policy statements • Machine-understandable model of low-level policy statements • Structural and procedural relations between these • PW. WP. 3 (WP 14): Automated Planning • Refinement of the planning method • Analysis of decision factors and criteria • Planning component (integrated with repositories) 6
Overall progress in year 1 SCAPE • Startup phase • Conceptual advances • Development started a bit delayed • No major impact on delivery schedule • Parallel interacting streams • Analysis of methods: planning, policies, monitoring • Prototype development: Plato 4, analysis module, watch services • Integration experiments: Components and Taverna workflows • Milestones and deliverables ü ü MS 58 Policy elements (m 6) D 14. 1 Decision factors analysis (m 10) MS 59 Policy catalogue (m 12) D 12. 1 Watch design (m 12) 7
Status WP 12: Watch SCAPE • Watch service definition completed • Clarification of goals, scope and key concepts • Watch component design finalised: D 12. 1 • Analysis of drivers and constraints • Analysis of events and triggers Ø Architecture design • Development started • First milestone release in autumn 2012 • Simulation environment: Preliminary work started 8
D 12. 1: Key goals for Automated Watch SCAPE 1. Enable the planning component to automatically monitor entities and properties of interest 2. Enable human users and software components to pose questions about entities and properties of interest 3. Act as a central place for collecting relevant knowledge that can be used to preserve an object or a collection 4. Collect information from different sources through adaptors 5. Enable human users to add specific knowledge 6. Notify interested agents when an important event occurs 7. Act as an extensible component 9
Watch: Key concepts SCAPE • Knowledge base • • • Entities and their properties Measures of properties over time Triggers define conditions and events • Flexibility and extensibility • • A well-defined, flexible data model Adaptors for different information sources • Monitoring Capabilities • • • Internal Monitoring External Monitoring Monitor compliance, risks and opportunities 10
Information sources and clients SCAPE Format registries Content profiles Component catalogue Workflows Watch core Experiments Source Adaptors Knowledge base Conditions Notifications Policies Planning Watch Frontend Operations Browser snapshots Watch Frontend 11
Example conditions and events SCAPE • Policies specify object properties, content profiles describe object properties Ø Policy violation (e. g. objects that are not well-formed) • Plan specification includes tolerance levels for operations Ø QA measures on migration results outside specified boundaries Ø Migration performance below specified threshold • Plan specification includes format properties Ø Number of tools supporting a certain format drops below threshold • Plans specify criteria to be measured Ø New components developed/tested on platform that support desired QA measures Ø Experiments show risks related to tools in use 12
Current status in Watch SCAPE • Proof-of-concept (May/June) • Full-circle architecture validation • Mockup data sources • First iteration of Watch focuses on web content • Watch central service • Content profile adaptor • Focused vs. dispersed web crawls over time • Incremental addition of information sources • New adaptors may reveal new requirements 13
Content profile SCAPE • Global view of content • Distribution of file formats • Distribution of characteristics • Representative data sets • Stages • Collect metadata • Combine and filter • Reason on the result 14
Status WP 13: Policies SCAPE • Policies are governance statements, not executable rules • 3 levels of policy statements • Hi-level guidance: A Policy catalogue • Mid-level procedures and structures • Low-level control policies: A machine-understandable Policy model • Milestone 59: Policy catalogue closed in March • 1 st semantic model of control policies in m 15 • Further refinement in second iteration 15
Status WP 14: Planning SCAPE • Development baseline based on Plato 3 • • • Removed: PLANETS and other legacy dependencies Refactored: Modularise, decoupling, testing, . . Upgraded: JBoss 7, JSF 2, Richfaces 4. . Moving: maven, github, continuous builds. . . First milestone release in July (Policy model, repository integration, Taverna integration) • Define interfaces and integration • Taverna experimentation • Requirements for components catalogue • Repository and platform interface • Collect decision points to automate Ø Analyse decision factors and criteria 16
D 14. 1: Analysis of decision criteria SCAPE • PLATO, the Planning Tool • • Evidence-based, well-documented plans Hierarchy of objectives leading to quantified decision criteria Traceability from decision factors to decisions Case studies in and after Planets • Challenges: Effort, sharing, automation, scalability ü Analysis of the measurability and automation of criteria ü Standardisation and alignment of criteria ü Systematic assessment of the impact of certain criteria
Collect • Preservation plans • Decision criteria Align • Significant properties models • ISO SQUARE Software quality attributes • Format properties Categorise • Specify uniquely identified criteria • Categorise all case study decision criteria Develop • Define and implement impact factors • Visual analysis tools Analyse • Impact factors for criteria • Impact factors for sets of criteria A method and tool for decision criteria analysis
Collect: Some case study data from Plato No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14. . . Object type Databases Documents Documents Images Images Video Games. . . (Original) object format MS Access Word Perfect PDF (versions) TIFF-6 TIFF-5 NEF raw image files Different raw image file formats GIF (versions) ROMs of SNES video games Media images of floppies and CD-ROMs … Organization type Archive Library Research Archive Library Archive Research Library Research
SCAPE Collect: Decision Criteria • • Objective Tree Utility Function Semantics Taxonomy of criteria measurements Criterion Action Runtime Static Outcome Judgment Object Format Effect
SCAPE Decision criteria: What to measure and how Criterion Action Runtime Static Outcome Judgment Object • 13 case studies with 617 criteria • Frequency distribution of criteria across taxonomy • Taxonomy is complete • Preservation of scanned images distribution over four case studies • But: no analysis of impact Format Effect
Align models for decision factors SCAPE • Format Properties • Library of Congress format evaluation • PRONOM format evaluation • Actual decision criteria • Software Quality • ISO SQUARE: Standardised software quality model • Object properties • Formats • Representation Instances • Significant Properties
Collect • Preservation plans • Decision criteria Align • Significant properties models • ISO SQUARE Software quality attributes • Format properties Categorise • Specify uniquely identified criteria • Categorise all case study decision criteria Develop • Define and implement impact factors • Visual analysis tools Analyse • Impact factors for criteria • Impact factors for sets of criteria A method and tool for decision criteria analysis
Develop SCAPE • Impact factors for criteria and sets • Frequency • Weighting • Utility function • Impact • Selectivity • Measures • Analysis tool • Criteria browser, set builder and analyser Ø Integrated in upcoming release of the SCAPE planning component
Analyse a criterion (set) C Understand key decision factors Goal Question Metric How often does C occur in scenario S? How important is C? How critical is C? Coverage Range Criticality
Sneak preview: The Knowledge browser • Analysis module for decision criteria • Part of the planning component • First milestone release: July 2012 SCAPE
SCAPE Conclusions • Systematic approach for analysis of decision criteria in preservation planning • Standardisation, cross-referencing, reusability • Method and tool for quantitative impact assessment • Enables SCAPE Planning and Watch to Ø Ø Facilitate experience sharing and knowledge creation Reduce complexity Optimize decision making Guide automation Ø Integrated in upcoming planning component Ø Enable sharing and alignment Ø Real-time analysis over time (Watch) Ø Guidance and QA of planning activities
SCAPE Year 2 work plan for Planning and Watch (1) • Watch • • • Proof of concept prototype Content profile adaptor and monitor Additional adaptors Simulation environment prototype in September Watch core services (version 1) in November • Policies • Control policy model • Catalogue elaboration • Model refinement and validation 28
SCAPE Year 2 work plan for Planning and Watch (2) • Planning • Automated planning component in July (Plato 4) • Scalability roadmap • Integration • Content profiling • Repositories • Workflow discovery and execution • Evaluation • Case studies in Testbeds • Key Performance Indicators 29
SCAPE Most critical technical dependencies • Preservation components • Planning evaluates action components • Watch uses (the output of) characterisation components to create content profiles • Quality Assurance measures quality of preservation actions for evaluation (including as part of planning) • Web browser watch service uses QA components • Platform • • • Planning and Watch queries components and workflows Planning runs experiments as Taverna workflows (directly in real-time) Planning and Watch components interface with repositories Plans specify workflows to be run on the platform Watch monitors REF 30
SCAPE Other results and publications • Lessons learned in Preservation Planning (JCDL) • Automated planning experiments • Actions, characterisation, QA and results reporting (ICADL) Ø Workflow construction in Taverna, components discovery and invocation • Automation and crowdsourcing (CIKM) • Decision making and governance • Relationship of preservation planning and IT Governance (ASIST, IPRES) • Maturity model for preservation planning and operations (ASIST) • Repository simulation • Evolution of a repository over time, given starting point and rules (IPRES) 31
SCAPE It’s 2014. You have content, a mandate, no action plans defined. What do you do? 1. Deploy the content profiler (uses characterization components for identification and property extraction) 2. Sign up with SCAPE Planning and Watch 3. Connect your repository to SCAPE Planning and Watch 4. Specify your policy model Ø Watch component starts monitoring content and policies and detect policy violations 5. You quickly create preservation plans • • by evaluating action components using characterisation and QA components in Taverna workflows, all integrated in planning The finished Plans contain workflow specification including SLAs 6. Deploy plans to repository (running e. g. on SCAPE platform)
SCAPE In 2015. . . Ø Watch monitors compliance of operations to plans and risks and opportunities connected to plans and policies Ø Monitoring conditions are automatically generated New content? Monitored Changed policies? Monitored Changed environment, format risks? Monitored New, better tools? Monitored New QA tools that measure critical features you had to check manually? Monitored ü Need an outlook on the status in 2017? Run a simulation ü Is there something else you want to have monitored? Write a watch adaptor and plug it in. ü ü ü Ø Upon changes, you can swiftly adapt plans and redeploy
SCAPE Thank you! • Questions? „SCAPE is set to move forward the control of digital preservation operations from ad-hoc decision making to proactive, continuous preservation management, through a context-aware planning and monitoring cycle integrated with operational systems. “
SCAPE
SCAPE
SCAPE What is a policy? Goals and constraints are often not defined explicitly § Policy definitions. . . § § But: “Policies” are encountered on a variety of levels in DP § § “an official expression of principles that direct an organization’s operations” “Formal statement of direction or guidance as to how an organization will carry out its mandate, functions or activities’ From TRAC statements to enforceable processing rules From the perspective of planning: Preservation Policies are governance statements (about constraints, goals, preferences, directives) that constrain or drive operational planning, but may also have other effects outside of operational planning. § They are not directly enforceable (they are business policies) § Preservation planning translates them into concrete actions. §
SCAPE 38
SCAPE 39
Domain model for the Knowledge Base SCAPE 40
SCAPE Compliance, risk and opportunities PLAN C 1 C 2 C 3 C 4 Automated? Yes No No Alternative 1 Compliance of operations to deployed plan (SLAs) Alternative 2 Alternative 3 Alternative 4 Opportunities for operations (new action tool) • Risks to operations (errors uncovered in QA tool) Opportunities for operations (new QA tool) Planning will generate SLAs and monitoring conditions automatically
SCAPE Compliance, risk and opportunities PLAN C 1 C 2 C 3 Automated? C 4 Compliance of operations to deployed plan Alternative 1 Alternative 2 Alternative 3 Alternative 4 Opportunities for operations (new action tool) Risks to operations (errors uncovered in QA tool) Opportunities for operations (new QA tool) Monitor criteria: change in objectives (caused by driver or constraint) • Add the policy context Ø Governance, Risk and Compliance •
Content-related triggers SCAPE
Environment-related triggers SCAPE
Community-related triggers SCAPE
Organisation-related triggers SCAPE
SCAPE High-level design of the Watch component 47
Four cases, three solutions: Scanned images SCAPE Bavarian State Library, 72 TB TIFF 6: Leave and monitor § British Library, 80 TB TIFF 5: Migrate to JP 2 (Image. Magick) § Royal Library of Denmark, ~10. 000 aerial photographs in TIFF 6: Leave and monitor § State and University Library Denmark, scanned yearbooks in GIF: Migrate to TIFF 6 § Scenario Chosen action Main reasons 72 TB scanned book pages in TIFF 6 Leave unchanged and monitor Color profile complications, lack of JP 2 browser support, Process costs 80 TB scanned newspapers in TIFF 5 Migrate to JP 2 Storage costs, Standardization Aerial photographs in TIFF 6 Leave unchanged and monitor Lack of JP 2 browser support, Process costs
SCAPE Scanned books requirements
Scanned books results SCAPE
Scanned books requirements SCAPE
SCAPE
SCAPE Results summary Factor group Coverage Impact Criticality Variance Format High Low/ Medium Action: Performance High Medium High Action: other High Low/ Medium Representation Instance Criteria Medium Low Medium High Transformation Information Criteria High
SCAPE Upcoming milestones • • • Now: Catalogue of high-level policy elements M 15: Machine-understandable model of control policies M 18: First prototype of the Planning component M 20: First prototype of the simulation environment M 22: First prototype of the Watch core services 54
SCAPE Watch: Work done in year 1 • • • User group survey on watch current practice Testbed scenarios and watch relationships RODA repository measures Watch Component definition checkpoint (m 6) Watch deliverable D 12. 1 (m 12) • Concepts, design, architecture, usage scenarios, triggers, data model, technology discussion • Preliminary work on Simulator • “Simulating the Effect of Preservation Actions on Repository Evolution” by Christian Weihs and Andreas Rauber, published at i. Pres’ 11 55
Reconcile: Software Quality SCAPE • ISO 25010 SQUARE: Systems and Software Quality Requirements and Evaluation • Functional suitability (completeness, correctness, appropriateness) • Performance efficiency • Compatibility • Usability • Reliability • Maintainability • Portability
SCAPE Software quality and preservation decisions • Business factors not part of SQUARE • Some aspects are of very varying relevance • Portability • Maintainability • Usability • Functional correctness = authenticity Ø Unified property model
Format vs. Object properties • • Format (or Representation) Properties Representation Instance Properties Information Properties Significant Properties • aka Transformation Information Properties • Functional correctness of preservation actions SCAPE
SCAPE Dissemination of results from PW • • D 12. 1 : Watch design D 14. 1: Decision factors analysis Blog entries on OPF Published articles • Preservation Decisions: Terms and Conditions apply. Challenges, Misperceptions and Lessons Learned in Preservation Planning. JCDL 2011 • Decision criteria in digital preservation: What to measure and how. JASIST 62/6, 2011 • Impact Assessment of Decision Criteria in Preservation Planning. IPRES 2011 • Automated Preservation: The Case of Digital Raw Photographs. ICADL 2011 • Control Objectives for DP: Digital Preservation as an Integrated Part of IT Governance. ASIST-AM 2011 • Simulating the Effect of Preservation Actions on Repository Evolution. IPRES 2011 • Quality assurance in Document Conversion: A HIT? (Books. Online@CIKM 2011) 59
Analysis tools • Criteria browser • Accesses knowledge base of PLATO • quantitative impact factors of criteria • browse, sort, filter • Criteria set builder • Flexible configuration of criteria sets • Quantitative impact factors of sets SCAPE
SCAPE Visualise • Format Standardization: consistent preferences
Visualise • Format compression: differing preferences SCAPE
Core Preservation Capabilities SCAPE Preservation Planning Preservation Operation Monitor, steer and control the preservation operation of content Control the deployment and execution of preservation plans. • Influencers and Decision making • Options diagnosis • Specification and delivery • Monitoring • Analyze content • Execute preservation actions • Ensure adequate provenance trail • Handle preservation metadata • Conduct Quality Assurance • Provide reports and statistics “Migrate this set of images (in TIFF-5) to JP 2 using Image. Magick 6. 3 with parameters a, b, c” • Analyse original • Migrate, analyse output • Conduct quality assurance • Provenance, metadata, Reporting
SCAPE COBIT processes. . . • • Driven by specific goals and controls Organized into activities with assigned responsibilities Related to other processes Measured on all levels: Internal vs. external goals and metrics IT Goals Key Performance Indicators Process goals Activity goals Process metrics Activity metrics
Preservation Planning example Ensure understandability … Number of objects with breach of understandability during time horizon … Manage obsolescence threats at logical level … Number of obsolescence issues successfully responded to … SCAPE Diagnose all options against requirements … Options diagnosis: Efficiency, completeness, correctness and timeliness …
Preservation Planning Process SCAPE
SCAPE A Capability Maturity Model for Preservation Planning Coming from Software Engineering, the CMM has been shown to be a powerful instrument for assessment and improvement Awareness and Communication 1 2 3 4 5 Policies, Plans and Procedures Tools and Automation Skills and Expertise Initial / ad-hoc Repeatable, but Intuitive Defined Managed and Measurable Optimized Responsibility and Accountability Goal Setting and Measurement
Responsibility and Accountability SCAPE Awareness and Communication Policies, Plans and Procedures Tools and Automation Skills and Expertise Goal Setting and Measurement 1 Some recognition of the need for control Disorganised adhoc decisions … Not defined 2 Management recognizes the need for controlling and communicates issues Planning process emerges, but informal and incident-driven Sporadic tool usage without Systematic integration. Some awareness of required skills, hands -on experience People take ownership of issues based on their own initiative on a reactive basis. 3 Importance of a planning approach is understood, accepted and communicated. Formal planning process in place, some strategy takes place Automated tools, but processes defined by available services … Responsibilities … assigned, documented and clearly communicated. 4 Systematic planning is part of the organization’s culture Planning fully supported by well -specified methods; internal best practice Automated planning system + operational monitoring … … … 5 Continuous improvement Industry best practice … … Unclear goals, no measurement …
Capability maturity increments SCAPE
- Slides: 69