An Architecture for Digital Archiving and Preservation Potential








































- Slides: 40
An Architecture for Digital Archiving and Preservation Potential for National Synergy GAELIC Summer Training Camp Wartenweiler Library, University of the Witwatersrand 26 th November 2009 Prof Derek W. Keats Deputy Vice Chancellor (Knowledge & Information Management) The University of the Witwatersrand, Johannesburg http: //kim. wits. ac. za/dvcblog derek. keats@wits. ac. za
Archiving and preservation Historical papers Library collections History Workshop Video and audio collections e. g. Wits TV Donations of significant collections from industry History of human evolution Fossils Institutional papers and documents e h T al u s u
Components Born analogue Physical archive Digital archive Preservation Born digital
Physical archive: drivers High cost of campus space Need for space in libraries due to growth in student numbers and increased emphasis on research Opportunity of proximity to the William Cullen library Need to preserve historically significant materials
The archive dream
Components Born analogue Physical archive Digital archive Preservation Born digital
Value People Technology Process Modified after the work of Georges Por
Value Information Wisdom Knowledge People Technology Process Modified after the work of Georges Por
Value Information Wisdom Knowledge Competencies Skills Attitude People Myths Modified after the work of Georges Por
Value People Technology Process Modified after the work of Georges Por
Archiving email e. Learning Web & portal Enterprise document management Hosting services Tutu project Other Private sector archives National archive initiatives Provincial and local project initiatives National research data archive ? ?
2009 Vice Chancellor DVC: Academic DVC: Finance & Ops DVC: Research e. Learning Libraries Registrar Management information DVC: KIM Computer & networking srvs DVC: Advancement
Innovation driven Where are there opportunities to create synergy, leading to innovation? Process Innovation Support (people) Technology Innovation
Institutional context The cutting edge We help to lead technology trends Option 2 Option 3 Option 4 Option 5 Mature but relatively new technologies Option 1 We create and use new technologies before anyone else Safe, proven, mature old technologies Risk without ecosystem
Institutional context The cutting edge We help to lead technology trends Option 2 Option 3 Option 4 Option 5 Mature but relatively new technologies Option 1 We create and use new technologies before anyone else Safe, proven, mature old technologies Risk with ecosystem
g n o l A th i w ed t ia c o s as d n a s ie g lo o n h c te re a tw f so s e i lit i b a p a c
Massive clusters of low cost commodity computers Capable of virtualization to run multiple systems and take advantage of better use of CPU cycles and storage - Compute cloud (e. g. Amazon Elastic Cloud) - Data cloud (e. g. Amazon S 3) - Public cloud - Private cloud
Infrastructure projects Wits web presence e. Learning email
Private cloud infrastructure Simplified view Wits portals email e. Learning Compute cloud Hierarchical storage Robotic tape library Spinning disks Flash memory
Private cloud infrastructure Simplified view Hosted services Wits portals email e. Learning Digital archive Compute cloud Hierarchical storage Robotic tape library Spinning disks Flash memory
Private cloud infrastructure Hosted services Wits portals email Application e. Learning Digital archive Application Data grid Object store Middleware Hardware layer Virtualization Operating System Compute cloud Hierarchical storage Robotic tape library Spinning disks Flash memory
Private cloud infrastructure Hosted services Wits portals email e. Learning Digital archive Free and Open Source Software Application Data grid Object store Middleware Hardware layer Virtualization Operating System Compute cloud Hierarchical storage Robotic tape library Spinning disks Flash memory
Private cloud infrastructure Hosted services email Zimbra Wits portals e. Learning Digital archive Chisimba DSPACE Chisimba i. RODS Fedora Glassfish Hardware layer Virtualization OS: Open Solaris Compute cloud Hierarchical storage Robotic tape library Spinning disks Flash memory
Private cloud infrastructure Hosted services email Zimbra Wits portals e. Learning Digital archive Chisimba DSPACE Chisimba i. RODS Fedora Glassfish Hardware layer Virtualization OS: Open Solaris Compute cloud Hierarchical storage Robotic tape library Spinning disks Flash memory
Built on a stack of FOSS applications and libraries using a suite of FOSS development and collaboration tools
Creating semantic and socially connected archives repositories museums herbaria
Semantic and social archive University of the Western Cape Fedora commons SWORD API Chisimba 'Portals' e. Learning Chisimba API Chisimba SWORD API Fedora Commons XMPP University of the Witwatersrand
Private cloud infrastructure Hosted services email Zimbra Wits portals e. Learning Digital archive Chisimba DSPACE Chisimba i. RODS Fedora Remote site Glassfish Virtualization OS: Open Solaris Compute cloud Hierarchical storage Robotic tape library Spinning disks Remote site Flash memory i. RODS
Private cloud infrastructure Hosted services email Zimbra Wits portals e. Learning Digital archive Chisimba DSPACE Chisimba i. RODS Fedora Glassfish Virtualization OS: Open Solaris Compute cloud Hierarchical storage Robotic tape library Spinning disks Remote site Flash memory Remote site
i Rule Oriented Data Systems a grid-level middleware for sharing data and metadata distributed across heterogenous resources adaptive middleware Micro-services Key: Rule Oriented
Use in establishing digital archive Remote site IRODS rules Digital archive IRODS rules Ingest Born digital Digital conversion Source artifacts Video Audio Docs Chisimba DSPACE Fedora Ingest OS: Open Solaris First tier storage Robotic tape library Remote site i. RODS Glassfish etc IRODS rules Private cloud infrastructure Compute cloud Storage cloud Digital conversion Source artifacts i. RODS IRODS rules
Type of equipment Docs Specialized Proprietary OCRopus Tesseract SANE Standard Python GUI GStreamer Rules Python GUI Standard Audio This slide is a work in progress Capture software Digital conversion Source artifacts Video Python GUI Standard GStreamer Mencoder FOSS in the Ingest
Enterprise document management An approach using private cloud Workflow managed by i. Rods layer WWW Site i. Folder client i. Rods Site Network i. Rods Born digital i. Folder client Network Site Ingest i. Folder server i. Rods Chisimba Private cloud infrastructure
Endnotes Cloud with hierarchical storage can serve the needs of Wits for digital archiving and preservation Can also be used to build semantic and socially connected archives Can be used for other kinds of digital storage Opportunities nationally and in other institutions All software is Free Software (open source) Opportunities for innovation
No secret science Attribution file: http: //www. dkeats. com/usrfiles/users/ 1563080430/attribution/attrib. txt