Greenstone in Building Digital Library Collections Internet Intranet
Greenstone in Building Digital Library Collections
Internet / Intranet Multimedia Library Info System Gateway-out Data capture USER @ anywhere (access to information from anywhere)
Organizational Transformation in Libraries � Traditional / Automated �Organization is physical �Shelving of documents - Based on Subject Cln �Key - Index / Catalogues / Cards / Digital Catalgs �Cards - Real/Virtual - Author, Title, Descriptions � Digital �Organization in terms of digital files /objects �Contains material digitized form �Contains digital material �Architecture �Key - Metadata
Shift in Technologies / Approaches Traditional Limited/ Rigid AACR 2 CCC CC / LCCS DDC / UDC Thesauri/LCSH Automated Dig. Library Improved Efficient/ Flexible AACR 2 ISO 2709 CCF MARC Thesauri Metadata DCMI -- W 3 C EAD, TEI, DTD METS, MODS, Z 39. 50 MARC 21
Features of Digital Libraries… • Dynamic Electronic Information Systems • Seamless Aggregation and Integration of Scholarly Content • Create / Maintain Local Content • Strengthens - mechanisms and capacity Information Systems / Services • Increase Portability • Efficiency of Access • Flexibility • Availability • Long term preservation
DL Software: Alternatives • Develop local web-based application? • Commercial DL solution? • Adopt open source software? – – Greenstone Eprints DSpace (CDS/ISIS, Koha)
Principles for Building DLs � � � � � Expect change Know your content Involve the right people Design usable systems Ensure open access Be (a)ware of data rights Automate whenever possible Adopt and adhere to standards Ensure quality as well as reliability Be concerned about persistence
Greenstone DL Software Overview of Features, Capabilities & Applications
What is the Greenstone software? • Software suite for building, maintaining, and distributing digital library collections • Comprehensive, open-source • Developed by New Zealand Digital Library Project at the University of Waikato • Distribution and promotion partners: – UNESCO
Features of Greenstone • Open Source Philosophy • Interfacing & Content Delivery via Web • Multi S/W Platform • Multi Lingual Support • Multi Formats • Structured Metadata in XML using DC • Metadata Extraction • Searching & Browsing • Plug-ins for Documents • • • Full-text mirroring Text Level Penetration Data Compression Password protection Administrative Functions Concurrent & Dynamic Content Development Uniform Presentation Publishing on CDROMs International Presence
Greenstone Features contd. . . • • Easy Installation Easy Maintenance Content Development (3 alternate ways) Interface Customization – Front Page Design, Header for the Digital Library, Collection Icon, Cover Images Collection Configuration (Collect. cfg) File Scalability, Flexibility Interoperability, OAI Compliance Lifeline : Listserv / E-Group / Archives
Greenstone DL Software Access ü Accessible via any Web browser ü Server runs on Windows and Unix ü Collections can be published on CD-ROM Searching/ ü Full-text and fielded search browsing ü Flexible browsing facilities ü Metadata-based (Dublin Core) ü Collection-specific ü Hierarchical phrase browsing supported ü Creates all access structures automatically Extensible ü Plugins — new document, metadata formats ü Classifiers — new metadata browsers Multilingual ü Documents and interfaces ü Chinese, Arabic, Maori, Russian etc (+ European) ü Multimedia: video, audio collections exist
The power of open source: Greenstone uses … v Ghostscript Interpreter for Adobe Postscript documents (Postscript plugin) v Kea Keyphrase extraction program (to generate metadata) v pdftohtml Converter for PDF documents (PDF plugin) v rtftohtml Converter for RTF documents (RTF plugin) v Text. Cat Detects languages and document encodings v wv. Ware Converter for Word documents (Word plugin) v Xlhtml Converter for Excel/Powerpoint documents (plugins) v XML: : Parser Parses XML documents, used to read and write Greenstone’s internal XML document format
and … v MG Creates compressed full-text indexes and performs searches v GDBM Database used for metadata etc v wget Downloading pages from the Web when creating collections v YAZ Client and server implementation of Z 39. 50 v Stemmer English language stemmer v GCC C/C++ compiler v CVS Version control system v Perl Used for plugins etc v Apache Web server used by many Greenstone installations v OAI-PMH OAI Performance
Collection Building • Input: a set of source documents, possibly in many different formats • Greenstone “imports” these documents and converts them to its own internal (GA) format – Extracts as much metadata as possible • Greenstone “builds” indexes and browsing structures using the GA files • Start with a few documents, get the design right, then add the bulk of the documents
Collection Building… � � � Greenstone used to have three modes of collection building, viz. , Command Line, Web Interface and the GLI (Greenstone Librarian Interface) Progressing with version 2. 4 x. , the GLI got strengthened as well as popularized Web Interface mode has been withdrawn temporarily. The GLI based collection building is quite easy and simple a method. Collection developers can activate the GLI software and use the ‘Gather’, ‘Enrich’, ‘Design’, and ‘Create’ panel for making collection
GLI Functions • Establish new collection (or work on old) • Select files to include in collection (Gather) • Enrich files with metadata (Enrich) • Select Plugins, Indexes, Classifiers (Design) • Build Collection (Create) • Customize Appearance • Preview Collection
The Greenstone Librarian Interface (GLI) v v v Building collections Interactive Java program Runs on anything Build a collection on the computer you are on … plus new applet version Includes metadata editor v Caveat: cannot deal with such huge collections as Greenstone can (particularly of metadata) • Invoke GLI: build a small collection of HTML files • Gather • Create • Look at extracted metadata • Set up shortcut in the Librarian interface
Create a new collection
Gather: Gather the files together
Create: Build the collection
Preview: admire the result
A (slightly) enhanced collection - Multimedia Add plugin § Unknown. Plug, set to accept MIDI files Add metadata § for “browse” button (8 items) § for image titles (14 titles) § to correct misspelling (mistery) (1 item) Add/modify classifiers § § § modify to display dc. title or ex. title add one for “browse” button remove the one for filename add one for phrase index add regular expressions to clean up titles Modify format statements § show title only for cover images § suppress text document icon for MP 3/MIDI items § make bookshelves show many documents they contain General § assign collection icons § assign icons for non-standard media types: lyrics, discography, etc
Under the hood: Collection configuration file v name, icon, etc v description v email of creator v search indexes v plugins v classifiers how to format v documents v query results v classifiers creator sjboddie@cs. waikato. ac. nz maintainer sjboddie@cs. waikato. ac. nz public true beta true indexes section: text section: Title document: text defaultindex section: text plugin GAPlug plugin Arc. Plug plugin Rec. Plug classify Hierarchy -hfile sub. txt -metadata Subject -sort Title classify HDLList -metadata Title classify Hierarchy -hfile org. txt -metadata Organization -sort Title classify List -metadata Howto format Search. VList "<td valign=top>[link][icon][/link]</td> <td>{If}{[parent(All': '): Title], [parent(All': '): Title]: } [link][Title][/link]</td>" format CL 4 VList " [link][Howto][/link]" format Document. Images true format Document. Text "<h 3>[Title]</h 3>\n\n<p>[Text]" collectionmeta collectionname "greenstone demo" collectionmeta collectionextra "This is a demonstration collection for the Greenstone digital library software. n. It contains a small subset (11 books) of the Humanity Development Library" collectionmeta iconcollectionsmall "/gsdl/collect/demo/images/demosm. gif" collectionmeta iconcollection "/gsdl/collect/demo/images/demo. gif" collectionmeta. section: Title "section titles" collectionmeta. document: text "entire books" collectionmeta. section: text "chapters“
Alter configuration v Add full-text index of titles indexes document: Title additional indexes line v. . . or authors indexes document: Creator … need author metadata v Add alphabetic author browser classify AZList Creator add–metadata classifier line v Include Word documents plugin Word. Plug add plugin line v Include PDF documents plugin PDFPlug v Separate index for each languages add en languages fr es line v Extract acronyms and add list option plugin PDFPlug plugin –extract_acronyms v Import OAI metadata plugin OAIPlugadd plugin line v Extract phrase hierarchy and add browser add classifier line classify Phind v Alter the format of any of the above format … v Restrict collection’s interface langs format Preference. Langs en|fr|es add format string v Change default interface language cgiarg shortname=1 argdefault edit site config file =fr (same) add format string
Customization Ø Greenstone is specifically designed to be highly extensible and customizable. Ø New document and metadata formats are accommodated by writing "plugins" (in Perl). Ø Analogously, new metadata browsing structures can be implemented by writing "classifiers. " Ø The user interface look-and-feel can be altered using "macros" written in a simple macro language. Ø A Corba protocol allows agents (e. g. in Java) to use all the facilities associated with document collections. Ø Finally, the source code, in C++ and Perl, is available and accessible for modification
Customizing with macros – let you customize presentation – present pages in different languages – print variables into the page text (e. g. number of search hits) • Macro files – stored in gsdl/macros folder – each file defines one or more “packages” (A “package” is a group of macros) – loaded on startup (note difference between Local and Web Library) – listed in etc/main. cfg • Collection-specific macros – Stored in gsdl/collect/mycol/macros/extra. dm – Or include argument [c=collectionname] for each macro
Personalizing your home page C: Program Filesgsdletcmain. cfg change home. dm to yourhome. dm
Hierarchy Structure
Collection configuration • Collection configuration file determines content conversion, extraction and building of indexes and browsing structures – indexes, classifiers, plugins • Presentation of search/browse results and collection interface is determined by “format” strings and “macros”
Documentation and help • Available at: www. greenstone. org – – Software Demo collections FAQ Tutorial materials • Documentation: – Installer’s Guide, User’s Guide, Developer’s Guide, From Paper to Collection • Mailing lists: – Greenstone Users List – Greenstone Developers List
Documentation and help Manuals on the CD-ROM (docs) – Installer’s Guide (install. pdf, 36 pp) Versions of Greenstone, installation procedure, Greenstone collections, setting up the web server, configuring your site, personalizing your installation – User’s Guide (user. pdf, 90 pp) Overview of Greenstone, using Greenstone collections, the collector, administration, software features, glossary of terms – Developer’s Guide (develop. pdf, 113 pp) Understanding the collection building process, getting the most out of your collections, the Greenstone runtime systems, configuring your Greenstone site – From Paper To Collection (paper. pdf, 30 pp) Scanners and scanning, OCR, 3 examples – from 1, 000 to 100, 000 pages, Creating an electronic collection
Documentation and help • greenstone. org – Download: software and tutorials – Example collections – Documentation – FAQ: general info section – support (+ join mailing list) – Configuration files for nzdl. org collections • nzdl. org – Documentation collections – Documented example collections
Documentation and help Mailing Lists – Greenstone Users List For people installing and using standard Greenstone Join at: https: //list. scms. waikato. ac. nz/mailman/listinfo/greenstoneusers Mail to: greenstone-users@list. scms. waikato. ac. nz – Greenstone Developers List For people customizing their version of Greenstone Join at: https: //list. scms. waikato. ac. nz/mailman/listinfo/greenstonedevel Mail to: greenstone-devel@list. scms. waikato. ac. nz Mailing List Archives A Greenstone collection of mail from both mailing lists http: //www. nzdl. org/gsarchives
DL Collection / E-Books
DL Collection / E-Books
DIGITAL LIBRARY ARCHITECTURE Network OS Z 39. 50 /OAI-PMH DL Software METS/MODS EAD Data/ Objects DCMI TEI
- Slides: 63