SCAPE Building Scalable Environments Technologies and SCAPE Platform
SCAPE Building Scalable Environments Technologies and SCAPE Platform Rainer Schmidt SCAPE Training Event September 16 th – 17 th, 2013 The British Library
SCAPE SCAlable Preservation Environments Motivation • Increasing amount of data in data centers and memory institutions. • Cannot be handled using traditional environments like databases or server facilities. • Institutions require ability to process large and complex data sets in preservation scenarios • Examples are data migration, information extraction, quality assurance. • Goal is to take advantage of data-intensive computing technologies for digital preservation. 2
SCAPE SCAlable Preservation Environments What we will show you • Example Scenarios from the SCAPE Testbed and how they are formalized using Workflow Technology • Introduction and hands-on exercise using the involved preservation tools. • Overview of the SCAPE Platform, its underlying technologies, preservation services, and how to set-up. • Creating scalable workflows and deploy them on the platform. • Execute SCAPE workflows using a virtual machine environment as well as on a demonstration cluster. 3
SCAPE Workflows in this Context SCAlable Preservation Environments • Formalized (and repeatable) processes/experiments consisting of one or more activities interpreted by a workflow engine. • Usually modeled as DAGs based on control-flow and/or data-flow logic. • Workflow engine functions as a coordinator/scheduler that triggers the execution of the involved activities • May be performed by a desktop, on server-sided component, or both. • Example workflow engines are Taverna workbench, Taverna server, and Apache Oozie. • Used for experimentation & research, SOA support, Hadoop integration. 4
SCAPE Challenges in SCAPE SCAlable Preservation Environments • Providing means that aid workflow developers in parallelizing different scenarios. • Depends a lot on nature of the data and workflow • Handling the interaction between external tools and Map. Reduce programs. • Interaction of the execution environment with data sources and sinks, in particular with repositories. • Interfacing with preservation planning and watch tools including semantic search, reporting. • Maintaining a central infrastructure and providing guidance for deploying local instances in different institutional settings. 5
- Slides: 5