Digital Preservation in e Research Adam Butler Russell
Digital Preservation in e. Research Adam Butler Russell Noble Account Executive, Systems Division Systems Sales Consultant
Agenda • Definitions • Challenges • Preservation: Now and Vision • Oracle Technologies to Help • Case Studies
Digital Preservation - Definition Digital Preservation is the series of managed activities necessary to ensure continued access to digital materials for as long as necessary. It combines policies, strategies and actions to ensure access to reformatted and digital born data regardless of the challenges of media failure and technological change. The goal of digital preservation is the accurate rendering of authenticated content over time.
Preservation Challenges People. Process. Portfolio.
Challenges in Research Preservation - Additional Challenges for “Humanities” - For All e. Research - Digitisation of Analogue artifacts – - Growing Capacities - Archeological, Manuscripts, Births & - Formats – Data & Media Deaths, books published prior to - Hardware (& software) / media digital availability/ readability - Access - Rights Management - “Bit Rot” - Ingest of “Born Digital” or converted assets
Data Protection File Damage • Printed Photo has water marks and creases but is identifiable • Flip one bit in a file and it becomes unreadable Data Protection • Multiple copies • Data Integrity Validation
“Reliable retrieval is key: An archive that only stores content is indistinguishable from a landfill. We need technology that reliably delivers all the content whenever requested and tell us proactively if there are issues affecting the retrieval of archived content. ” Scott Rife The Library of Congress
Preservation: Now & Futures People. Process. Portfolio.
Preservation: Now - Still Very Much In the Research Domain ie Preservation itself is a “Developing Discipline” - Emerging standards, organisations and software supporting the practice - Standards like OAIS - Organisations like: - Dpconline. org (Digital Preservation Coalition) - Alliancepermanentaccess. org - JISC, LTDP - Digitalpreservation. gov (US Library of Congress) - Software like - Tesella - Fedora. Commons, DPSACE, Dura. Cloud http: //wiki. esipfed. org/images/0/0 b/OAIS_Functional. Entities. jpg
Preservation: Future – Preservation as a Service Analog to Digital Ingest & Convert to Preservation Format Automated Verified Tiered Content Infrastructure SAM Application Tier 1 Storage Tier 2 Storage Tier 3 Storage N E W
Oracle Technologies Used in Preservation People. Process. Portfolio.
Storage. Tek T 10000 C Tape Drive Product Innovation • 5 TB native – 10 TB compressed • Performance: 252 MB/sec native • Investment protection: – Legacy read head that reads Storage. Tek T 10000 A & B tapes – Media re-use at higher capacity with Generation 4 drive • 2 GB buffer © June 2011 Oracle Corporation
Archive End to End Data Integrity Storage. Tek Data Integrity Validation • User creates a CRC (T 10 ANSI standard) for each record • Storage. Tek T 10000 C checks CRC as each record is received • The DIV CRC of each record is written to tape with that record • When a record is read from tape the CRC is always checked – The SCSI Verify command can be used to check each record without transferring data to the application. (i. e. internally verified by the Storage. Tek T 10000 C) Records have DIV CRC added at host Record DIV CRC checked; write and read Records sent to tape with DIV CRC © August 2011 Oracle Corporation
Oracle’s Storage. Tek Tape Analytics Software Simplify tape storage management, by taking a proactive approach to eliminate library, Managing Tape Has Never Been So Simple drive, and media errors ü Simplify Tape Management Tape Analytics monitors all your drives and media so you can focus your resources elsewhere ü Leverage Intelligent Analytics Oracle’s proprietary algorithms provide proactive health indicators that can be trusted ü Worry Free Deployment Tape Analytics gathers performance data through the library without ever entering your live data path ü Grow with Peace of Mind A monitoring application that scales to meet your needs, Tape Analytics supports monitoring multiple globally dispersed libraries from a single interface Tape Analytics is Exclusively Available for Oracle’s Storage. Tek Tape Libraries
A First Look at the Storage. Tek Tape Analytics Interface Simple User Interface Quickly Identify Suspect Devices Proactive Health Indicators
Introduction to the Tape Analytics Interface Quickly Identify Suspect Devices 1 1 2 Unhealthy Media & Drives are Displayed on the Dashboard 2 3 Quickly Drill Down to Drives by Health State Capture Suspect Media VOLSER Numbers to Take Action 3
Capacity Efficiency Use More of What You Purchase AND Run Faster Tiered Storage Infrastructure Single-Tier Storage Multi-Tiered Storage 2% § Multiple, optimized tiers § Policy based migration 3% – Watermarks, age, type – WORM for Compliance 15% § Multi-location data sets for DR § Applications run faster from performance optimized tiers § Growth primarily at capacity tiers Flash Storage $50 /GB Performance Disk $5 - $10/GB Capacity Disk $1 - $4/GB 100% 80% ~$7. 5 M / PB Tape Storage <$0. 15/GB ~$1. 7 M / PB Source: Horison Information Strategies, The Era of Colossal Content
Oracle Tape Industry Leading SL 8500 1. 8 – 3. 0 Exabyte Capacity Gen 5 12 - 20 TB Capacity 5 Year Trajectory HARDWARE Tape Capacity 20 x Archive Capacity 30 x SL 8500 100 PB Capacity T 10000 B 1 TB Capacity 2010 12 SL 8500 1. 05 – 1. 35 Exabyte Capacity 18 – Gen 4 7 - 9 TB Capacity SL 8500 PB Capacity T 10000 C 5 TB Capacity 2011 2012 2013 2014 2015
Tape Cost Advantage Over Disk
Lifecycle Issues Measuring Tape and Disk Lifecycle Considerations Disk Tape • Max shelf life (bit rot) 10 years 30 years • Best practices for data migration to new technology 3 -5 years 8 -12 years 10 -14 ( ~10’s of TB) 10 -19 (~1 million TB) 238 X X • Uncorrected Bit Error Rate, Probability (avg 1 error in x TB) • Power and cooling “The cost of energy alone for the average disk-only (archive) solution exceeds the entire TCO of the average tape-based solution. ” The Clipper Group, “In Search of the Long-term Archiving Solution” - December 2010 Each technology refresh or migration has a cost associated with it.
Case Studies People. Process. Portfolio.
" Data availability and access 7 x 24 in our business is mandatory. Actually, it is often a matter of life or death. Oracle's SAM QFS solution allows us to easily store over 1 Billion files ranging from 512 bytes to 58 Gigabytes. This flexibility allows concurrent ingest and access from 79 different applications using the CIFS, NFS and FTP protocols on Solaris, Linux and AIX clients. We are able to continuously replicate between two active/active data centers 30 miles apart with 20 second fail-over. Lack of immediate data access is simply not an option. " Robert Dick Novant Healthcare
”From the beginning, Oracle Technology has been critical to our mission to preserve the testimonies of Holocaust survivors and witnesses, and provide fast and easy access to these invaluable assets. Oracle’s Storage. Tek T 10000 C tape technology has enabled us to ensure data integrity, archive five times more data in our existing footprint and reduce the cost of maintaining these oral histories for generations to come. " Sam Gustman CTO USC SHOAH Foundation
Questions
- Slides: 25