Web archiving at the NLA Archiving the music
Web archiving at the NLA ‘Archiving the music web’ Music Council of Australia Annual Assembly 28 September 2009 Paul Koerbin Manager Digital Archiving National Library of Australia
1. Background – the what, why and how 2. What makes a valuable resource for archiving? 3. What can you do to help?
What is web archiving about and why do it? • Archiving = long-term preservation and access • Building collections • Building ‘documentary’ historical record • Creating artefacts from the web experience • Discovering what is produced online • An act of consciousness
What’s involved in web archiving? At the NLA it’s: • Identifying, selecting, scoping • Seeking permission to collect and make accessible • Creating and recording metadata – administrative, descriptive, preservation • • • Crawling/harvesting (including scheduling) Processing for quality assurance (best effort) Storing and maintaining the data Planning and implementing preservation strategies Preparing and rendering for public display Providing access and discovery mechanisms
What is the NLA doing? • PANDORA Archive 1996→ – PANDORA participants • NLA, state libraries (not Tas), NFSA, AWM, AIATSIS (and soon the NGA) – Highly selective, small scale, ‘quality’ collection, open access – PANDAS workflow management system, 2001→ • Australian (. au) domain harvests – Annual since 2005 – Internet Archive – No access (yet)
Comparative statistics of NLA web collections PANDORA (selective) . au Domain Harvests Files: 73 million Files: 2. 3 billion Size: 3. 26 TB Size: 78. 75 TB Domain Harvest 2005 2006 2007 2008 Unique files 185 million 596 million 516 million 1 billion Hosts crawled 811, 523 1, 046, 038 1, 247, 614 3, 038, 658 Size 6. 69 TB 19. 04 18. 47 TB 34. 55 TB
Music in the PANDORA Archive • 500+ titles available from the PANDORA public listing of music – NFSA 33% – NLA 30% – Others 37% • Musicians, bands, orchestras, composers, organisations, festivals, blogs, instrument makers, magazines … • Plus 280 considered but not available – 35% (no permission, rejected, yet to be selected)
What makes a valuable resource for archiving? • Content – substantial, original • Provenance • ‘Long-term research value’ • Cultural or social significance and interest – including events • Curatorial/expert suggestion (e. g. Music Australia) • Different collecting approaches based on ‘value’ • Priorities, but never say never
How can you help? 10 tips: 1. Think about the issue of long term access – what is your intention? 2. Communicate interest and intentions – with collecting institutions; let us know about your site – respond to requests for permission 3. Organise and structure sites simply – its all about links 4. Comply with standards – limit use of proprietary technology if possible 5. Make it robot friendly – indexing, discovery, capture
How can you help? 10 tips: 6. Keep contributors informed and involved – make sure contributors understand agree to long-term preservation and access from the beginning 7. Clear copyright, rights and contact information – it helps to know what and who (oh, and trust us too) 8. Maintain content online as much as possible – increases chance of it being collected 9. Learn to love and live with your past – archives are not the same as the ‘live’ web – archived versions cannot be altered 10. Do your own back up, of course
PANDORA Australia’s Web Archive http: //pandora. nla. gov. au/
- Slides: 11