Greenstone Digital Library Software GSDL Open Source Software
Greenstone Digital Library Software (GSDL) Open Source Software to Build Digital Libraries
What is open-source software? • “The basic idea behind open source is very simple: When programmers can read, redistribute, and modify the source code for a piece of software, the software evolves. People improve it, people adapt it, people fix bugs. And this can happen at a speed that, if one is used to the slow pace of conventional software development, seems astonishing. ” - from www. opensource. org • Anyone can redistribute the software, • Source code must always be available
What is a Library? A trinity BOOKS USERS STAFF
What is a Digital Library? • A digital library is an organized collection of information – A focused collection of digital objects – Methods for finding, access and retrieval – Methods for selection, organization, and maintenance of the collection – Methods for preservation
GSDL - Introduction Greenstone is a suite of software for building and distributing digital library collections. It provides a new way of organizing information and publishing it on the Internet or on CD-ROM. Greenstone is produced by the New Zealand Digital Library Project at the University of Waikato, and developed and distributed in cooperation with UNESCO and the Human Info NGO. It is open-source, multilingual software, issued under the terms of the GNU General Public License.
Features • Builds and distributes digital library collections • Full-text document search and display • Multi-platform support • Web-based user interface • Highly customizable • Document collections can be exported to CD-ROMs • Can be used for archiving
Features of Greenstone Software • Access through Web browser • Windows or Unix • Searching • Browsing • Easy to maintain • Various metadata • Plug-ins for new document types • Multiple languages • Text, pictures, audio, video • Open Source Software • Hierarchical phrase and key-phrase indexes • Multi-gigabyte • Compression • Password Protection • User logs • Administrative functions • Updates dynamically without bringing system down • Publish to CD-ROM • Uniform presentation across different computers
Overview of Greenstone • Collections A typical digital library built with Greenstone will contain many collections, individually organized— though they bear a strong family resemblance. Easily maintained, collections can be augmented and rebuilt automatically.
Overview of Greenstone • Document Formats Source documents come in a variety of formats, and are converted into a standard XML form for indexing by “plugins. ” Plugins distributed with Greenstone process plain text, HTML, WORD and PDF documents, and Usenet and E-mail messages.
Overview of Greenstone • Multimedia documents Collections can contain text, pictures, audio and video. Non-textual material is either linked into the textual documents or accompanied by textual descriptions (such as figure captions) to allow full-text searching and browsing.
Using Greenstone Collections Figure shows a screenshot of the “Demo” collection supplied with the Greenstone software. Almost all icons are clickable. Several icons appear at the top of almost every page. Figure
What we wanted v “Collections” of digital material v Individualized, depending on metadata etc v Up to several Gb of text … v … + associated images, movies, whatever v Fully searchable v Served on WWW, or published on removable media v Run anywhere, on any computer v Fully internationalized v Non-exclusive: documents and metadata in any format v Non-prescriptive: standard and non-standard metadata
UNESCO: Distributing Greenstone DL software Sustainable development “Give a man a fish, feed him for a day Teach a man to fish, feed him for life” Greenstone software v. GNU licensed v. Fully documented … in English/French/Spanish/Russian v. Language interfaces … Arabic Chinese Czech … Thai Turkish v. Unix/Windows/Mac OS-X v. Trivial to install v. GUI interface for gathering, enriching, building … v. Serve collections on Web or write them to CD-ROM v. Document formats: HTML, Word, PDF, PS, plain text, e-mail v. Metadata formats: XML, DC, OAI, MARC, … download from http: //greenstone. org
Distribution Greenstone facts v Open source: Gnu GPL v Distributed via Source. Forge since: Nov 2000 v Average downloads: 5000/month since then v Humanitarian CD-ROMs produced: 30 -35 v Distribution for each one: 5000/year v Languages for interface: 38 v Languages for full software + manuals: 4 v Countries represented on email lists: 60 v UNESCO training courses in: Bangalore, Almaty, Dakar, Suva, … v UNESCO, Paris (“Information for All” programme) UN Agencies v FAO, Rome (Info Management Resource Kit) v UNU, Japan (CD-ROM collections of UNU material) International Technical centers v University of Waikato, New Zealand v Indian Institute of Sciences, Bangalore v University College, London v University of Cape Town, South Africa v University of Lethbridge, Canada
Sample collections at greenstone. org International Argentina Human Rights Commission Argentina Tasmania State Library Australia Peking University Digital Library China Gresham College, London England University of Applied Sciences, Stuttgart Germany Association of Indian Labour Historians, Indian Institute of Management, Kozhikode Indian Institute of Science, Bangalore India Vimercate Public Library, Milan, Italy Netherlands Institute for Scientific Information Services Netherlands Philippine Government Information Network Philippines Mari El Republic, Russia Slavonski Brod Public Library, Slovenia Vietnam National University Vietnam Welsh Books Council Wales
Sample collections at greenstone. org U. S. • • • Auburn University, Alabama Detroit Public Library Hawaiian Electronic Library ibiblio project, University of North Carolina Illinois Wesleyan University Le. High University, Pennsylvania New York Botanical Garden University of California at Riverside University of Chicago Library University of Illinois Texas A&M University Washington Research Library Consortium
Standards Metadata v Can use any metadata set, Dublin Core supplied v Plugins for XML MARC CDS/ISIS Pro. Cite Bib. Tex Refer OAI METS (subset) DSpace v METS can be used as Greenstone’s internal representation Serving v Web v Can publish Greenstone collections on CD-ROM v Can publish Greenstone collections on OAI v Export collections to METS v Export collections to DSpace (ready for DSpace’s batch import program) Documents v Plugins for PDF Post. Script Word, RTF HTML Plain text Latex ZIP Excel PPT Email Source code Images (any format: GIF, JPEG, TIFF …) MP 3 Ogg Vorbis Unknown. Plug (e. g. for audio, MPEG, Midi)
The power of open source: Greenstone uses … v Ghostscript Interpreter for Adobe Postscript documents (Postscript plugin) v Kea Keyphrase extraction program (to generate metadata) v pdftohtml Converter for PDF documents (PDF plugin) v rtftohtml Converter for RTF documents (RTF plugin) v Text. Cat Detects languages and document encodings v wv. Ware Converter for Word documents (Word plugin) v Xlhtml Converter for Excel/Powerpoint documents (plugins) v XML: : Parser Parses XML documents, used to read and write Greenstone’s internal XML document format
and … v MG Creates compressed full-text indexes and performs searches v GDBM Database used for metadata etc v wget Downloading pages from the Web when creating collections v YAZ Client and server implementation of Z 39. 50 v Stemmer English language stemmer v GCC C/C++ compiler v CVS Version control system v Perl Used for plugins etc v Apache Web server used by many Greenstone installations
Example Humanity Development Library for sustainable development and basic human needs v 160, 000 pages v 30, 000 images v 800 books v 430 magazines v 340 kg v. US$20, 000 v. CD-ROM v. US$1 v. Win 3. 1 x upward v. Stand-alone vand intranet server v. Web browser user interface Global Help Project, Antwerp (+ UN agencies)
Peking University Library Chinese documents (pictures of text) + Chinese interface
Chinese (Chinese & English interfaces) Classic Chinese literature
French UNESCO, Paris
Spanish PAHO, WHO
Russian Mari El Republic http: //gov. mari. ru/gsdl
The Greenstone Librarian Interface (GLI) v Building collections v Interactive Java program v Runs on anything v Build a collection on the computer you are on v … plus new applet version v Includes metadata editor v Caveat: cannot deal with such huge collections as Greenstone can (particularly of metadata)
Create a new collection
Gather: Gather the files together
Enrich: Add the Metadata
Design: Add plugins and configure them
Design: Search Indexes, etc
Create: Building the collection
Preview: admire the result
Create: It’s built – preview it?
Format: For Features Display, etc.
Export the collection to CD-ROM?
Export the collection to CD-ROM?
Previewing the collection
Full-text search
Search Results
Full Text Display
Form-based search
Browsing titles
Browsing by Keywords
- Slides: 52