An Architecture for Digital Archiving and Preservation Potential

  • Slides: 40
Download presentation
An Architecture for Digital Archiving and Preservation Potential for National Synergy GAELIC Summer Training

An Architecture for Digital Archiving and Preservation Potential for National Synergy GAELIC Summer Training Camp Wartenweiler Library, University of the Witwatersrand 26 th November 2009 Prof Derek W. Keats Deputy Vice Chancellor (Knowledge & Information Management) The University of the Witwatersrand, Johannesburg http: //kim. wits. ac. za/dvcblog derek. keats@wits. ac. za

Archiving and preservation Historical papers Library collections History Workshop Video and audio collections e.

Archiving and preservation Historical papers Library collections History Workshop Video and audio collections e. g. Wits TV Donations of significant collections from industry History of human evolution Fossils Institutional papers and documents e h T al u s u

Components Born analogue Physical archive Digital archive Preservation Born digital

Components Born analogue Physical archive Digital archive Preservation Born digital

Physical archive: drivers High cost of campus space Need for space in libraries due

Physical archive: drivers High cost of campus space Need for space in libraries due to growth in student numbers and increased emphasis on research Opportunity of proximity to the William Cullen library Need to preserve historically significant materials

The archive dream

The archive dream

Components Born analogue Physical archive Digital archive Preservation Born digital

Components Born analogue Physical archive Digital archive Preservation Born digital

Value People Technology Process Modified after the work of Georges Por

Value People Technology Process Modified after the work of Georges Por

Value Information Wisdom Knowledge People Technology Process Modified after the work of Georges Por

Value Information Wisdom Knowledge People Technology Process Modified after the work of Georges Por

Value Information Wisdom Knowledge Competencies Skills Attitude People Myths Modified after the work of

Value Information Wisdom Knowledge Competencies Skills Attitude People Myths Modified after the work of Georges Por

Value People Technology Process Modified after the work of Georges Por

Value People Technology Process Modified after the work of Georges Por

Archiving email e. Learning Web & portal Enterprise document management Hosting services Tutu project

Archiving email e. Learning Web & portal Enterprise document management Hosting services Tutu project Other Private sector archives National archive initiatives Provincial and local project initiatives National research data archive ? ?

2009 Vice Chancellor DVC: Academic DVC: Finance & Ops DVC: Research e. Learning Libraries

2009 Vice Chancellor DVC: Academic DVC: Finance & Ops DVC: Research e. Learning Libraries Registrar Management information DVC: KIM Computer & networking srvs DVC: Advancement

Innovation driven Where are there opportunities to create synergy, leading to innovation? Process Innovation

Innovation driven Where are there opportunities to create synergy, leading to innovation? Process Innovation Support (people) Technology Innovation

Institutional context The cutting edge We help to lead technology trends Option 2 Option

Institutional context The cutting edge We help to lead technology trends Option 2 Option 3 Option 4 Option 5 Mature but relatively new technologies Option 1 We create and use new technologies before anyone else Safe, proven, mature old technologies Risk without ecosystem

Institutional context The cutting edge We help to lead technology trends Option 2 Option

Institutional context The cutting edge We help to lead technology trends Option 2 Option 3 Option 4 Option 5 Mature but relatively new technologies Option 1 We create and use new technologies before anyone else Safe, proven, mature old technologies Risk with ecosystem

g n o l A th i w ed t ia c o s

g n o l A th i w ed t ia c o s as d n a s ie g lo o n h c te re a tw f so s e i lit i b a p a c

Massive clusters of low cost commodity computers Capable of virtualization to run multiple systems

Massive clusters of low cost commodity computers Capable of virtualization to run multiple systems and take advantage of better use of CPU cycles and storage - Compute cloud (e. g. Amazon Elastic Cloud) - Data cloud (e. g. Amazon S 3) - Public cloud - Private cloud

Infrastructure projects Wits web presence e. Learning email

Infrastructure projects Wits web presence e. Learning email

Private cloud infrastructure Simplified view Wits portals email e. Learning Compute cloud Hierarchical storage

Private cloud infrastructure Simplified view Wits portals email e. Learning Compute cloud Hierarchical storage Robotic tape library Spinning disks Flash memory

Private cloud infrastructure Simplified view Hosted services Wits portals email e. Learning Digital archive

Private cloud infrastructure Simplified view Hosted services Wits portals email e. Learning Digital archive Compute cloud Hierarchical storage Robotic tape library Spinning disks Flash memory

Private cloud infrastructure Hosted services Wits portals email Application e. Learning Digital archive Application

Private cloud infrastructure Hosted services Wits portals email Application e. Learning Digital archive Application Data grid Object store Middleware Hardware layer Virtualization Operating System Compute cloud Hierarchical storage Robotic tape library Spinning disks Flash memory

Private cloud infrastructure Hosted services Wits portals email e. Learning Digital archive Free and

Private cloud infrastructure Hosted services Wits portals email e. Learning Digital archive Free and Open Source Software Application Data grid Object store Middleware Hardware layer Virtualization Operating System Compute cloud Hierarchical storage Robotic tape library Spinning disks Flash memory

Private cloud infrastructure Hosted services email Zimbra Wits portals e. Learning Digital archive Chisimba

Private cloud infrastructure Hosted services email Zimbra Wits portals e. Learning Digital archive Chisimba DSPACE Chisimba i. RODS Fedora Glassfish Hardware layer Virtualization OS: Open Solaris Compute cloud Hierarchical storage Robotic tape library Spinning disks Flash memory

Private cloud infrastructure Hosted services email Zimbra Wits portals e. Learning Digital archive Chisimba

Private cloud infrastructure Hosted services email Zimbra Wits portals e. Learning Digital archive Chisimba DSPACE Chisimba i. RODS Fedora Glassfish Hardware layer Virtualization OS: Open Solaris Compute cloud Hierarchical storage Robotic tape library Spinning disks Flash memory

Built on a stack of FOSS applications and libraries using a suite of FOSS

Built on a stack of FOSS applications and libraries using a suite of FOSS development and collaboration tools

Creating semantic and socially connected archives repositories museums herbaria

Creating semantic and socially connected archives repositories museums herbaria

Semantic and social archive University of the Western Cape Fedora commons SWORD API Chisimba

Semantic and social archive University of the Western Cape Fedora commons SWORD API Chisimba 'Portals' e. Learning Chisimba API Chisimba SWORD API Fedora Commons XMPP University of the Witwatersrand

Private cloud infrastructure Hosted services email Zimbra Wits portals e. Learning Digital archive Chisimba

Private cloud infrastructure Hosted services email Zimbra Wits portals e. Learning Digital archive Chisimba DSPACE Chisimba i. RODS Fedora Remote site Glassfish Virtualization OS: Open Solaris Compute cloud Hierarchical storage Robotic tape library Spinning disks Remote site Flash memory i. RODS

Private cloud infrastructure Hosted services email Zimbra Wits portals e. Learning Digital archive Chisimba

Private cloud infrastructure Hosted services email Zimbra Wits portals e. Learning Digital archive Chisimba DSPACE Chisimba i. RODS Fedora Glassfish Virtualization OS: Open Solaris Compute cloud Hierarchical storage Robotic tape library Spinning disks Remote site Flash memory Remote site

 i Rule Oriented Data Systems a grid-level middleware for sharing data and metadata

i Rule Oriented Data Systems a grid-level middleware for sharing data and metadata distributed across heterogenous resources adaptive middleware Micro-services Key: Rule Oriented

Use in establishing digital archive Remote site IRODS rules Digital archive IRODS rules Ingest

Use in establishing digital archive Remote site IRODS rules Digital archive IRODS rules Ingest Born digital Digital conversion Source artifacts Video Audio Docs Chisimba DSPACE Fedora Ingest OS: Open Solaris First tier storage Robotic tape library Remote site i. RODS Glassfish etc IRODS rules Private cloud infrastructure Compute cloud Storage cloud Digital conversion Source artifacts i. RODS IRODS rules

Type of equipment Docs Specialized Proprietary OCRopus Tesseract SANE Standard Python GUI GStreamer Rules

Type of equipment Docs Specialized Proprietary OCRopus Tesseract SANE Standard Python GUI GStreamer Rules Python GUI Standard Audio This slide is a work in progress Capture software Digital conversion Source artifacts Video Python GUI Standard GStreamer Mencoder FOSS in the Ingest

Enterprise document management An approach using private cloud Workflow managed by i. Rods layer

Enterprise document management An approach using private cloud Workflow managed by i. Rods layer WWW Site i. Folder client i. Rods Site Network i. Rods Born digital i. Folder client Network Site Ingest i. Folder server i. Rods Chisimba Private cloud infrastructure

Endnotes Cloud with hierarchical storage can serve the needs of Wits for digital archiving

Endnotes Cloud with hierarchical storage can serve the needs of Wits for digital archiving and preservation Can also be used to build semantic and socially connected archives Can be used for other kinds of digital storage Opportunities nationally and in other institutions All software is Free Software (open source) Opportunities for innovation

No secret science Attribution file: http: //www. dkeats. com/usrfiles/users/ 1563080430/attribution/attrib. txt

No secret science Attribution file: http: //www. dkeats. com/usrfiles/users/ 1563080430/attribution/attrib. txt