BASE Institutional Repositories Bielefeld Academic Search Engine BASE
BASE: Institutional Repositories Bielefeld Academic Search Engine (BASE): an End-user Oriented Institutional Repository Search Service Dirk Pieper/Friedrich Summann Bielefeld UL
BASE: Institutional Repositories Overview: Part 1: 1. Institutional Repository Servers 2. BASE: concept and content 3. Creating a special view on institutional repository server collections 4. Demo: BASE user-interface and further visions 5. Part 2: 6. OAI dataflow, BASE dataflow 7. Repository information in registries 8. OAI harvesting problems 9. Further developments of BASE
BASE: Institutional Repositories Institutional Repository Servers: 1. Definition: “A digital collection capturing and preserving the intellectual output of a single or multi-university community. ” (Raym Crow, http: //www. arl. org. sparc/IR/ir. html) 2. IR servers exist of course also outside the university community 3. IR servers appear as simple web sites, database systems with OAI interface, …
BASE: Institutional Repositories BASE: concept and content 1. BASE uses Fast Data Search 2. BASE contains intellectual selected resources with focus on OAI-Servers but also web crawled content 3. BASE displays result lists as bibliographic data and full text hits 4. BASE frontend is written in PHP using the search API from Fast Data Search 5. BASE offers sorting, search refinement and search history
TUNING, ADMINISTRATION and DEBUGGING Pipeline SEARCH INDEX FILES FILTER Pipeline Search API DOCUMENT PROCESSING FILE TRAVERSER QUERY & RESULT PROCESSING WEB CRAWLER CONNECTORS BASE: Institutional Repositories BASE: concept and content
BASE: Institutional Repositories BASE: concept and content At present 2, 7 mio documents in 189 collections, 15 of them web crawled data
BASE: Institutional Repositories BASE: concept and content Projekt Gutenberg-DE Internet Library of Early Journals Oxford Various Institutional Repositories Springer Link Metadata Cornell Hist. Math Fulltext Crawl University Michigan Historical Math Cite. Seer Zentralblatt Mathematik Ar. Xiv OPAC UL Bielefeld Univ: Math. Preprints Ifo Institute Munich Zeitschriften der Aufklärung (Bielefeld UL)
Special view on IR server collections BASE: Institutional Repositories Collections are listed in configuration file [ftubirmingham] url = "http: //eprints. bham. ac. uk/" desc_de = "The Univ. of Birmingham: Eprints Archive" desc_en = "The Univ. of Birmingham: Eprints Archive" descdd_de = "Birmingham Univ. " descdd_en = "Birmingham Univ. " § Collections can be clustered for user-interface, e. g. “Institutional Repositories Europe” consists of [ftubarcelona], [ftubath], [ftubristol] , [ftuhelsinki], … § Parametric search possible § Frontend is ready for multi view (independent views with own configuration and layouts on the same backend)
BASE: Institutional Repositories Vision: search in Google Scholar Try your search on Google Scholar. . .
BASE: Institutional Repositories Vision: check citations in Google Scholar Check citations (citing articles) in Google Scholar. . .
BASE: Institutional Repositories OAI dataflow at Bielefeld UL OAI-Data Harvesting Dissertations, monographs (fulltext) OPAC Articles (fulltext) Pub. Med, Euclid, Ar. Xiv, Cite. Seer, Citebase, DOAJ articles Article Database All ressources (texts, images, video, references. . BASE Internal Index (FAST)
BASE: Institutional Repositories BASE dataflow OAI-Database Records Web Pages Harvesting Pre-Processing Internal Index (FAST) User interface (PHP)
BASE: Institutional Repositories Repository information in registries § Openarchives. org (383) § Eprints Registry (607) § Univ. of Illinois Registry (1000) § DSpace Registry (28) § Directory of Open Archive Repositories (324)
BASE: Institutional Repositories OAI-compliant univ. repositories in BASE 4 3 18 33 USA 76 Canada 13 South America 2 2 Africa 2 India 3 Australia 11 New Zealand 1 2 3 6 12 16 14 55 7 1 12 3
BASE: Institutional Repositories Tools for the Harvesting Environment § Open Source Harvester (FS Consulting, Perl with modifications) § XML Validator and Repairer (Bielefeld UL, based on Perl XML modules § OAI Harvest Watcher (Bielefeld UL, Perl) § OAI Resource Updater (Bielefeld UL, Perl) § OAI Registry Watcher (Bielefeld UL, Perl)
BASE: Institutional Repositories OAI harvesting challenges § Repositories do not response or deliver Error Messages § Links to the Document do not work § XML file is not well-formed § Data contain only References without any Fulltext § Access to fulltext is restricted § Field content varies
BASE: Institutional Repositories OAI Harvesting: Problems in Practice 1 <source>http: //xxx. uni-xxxxx. de/publications/ ELib. D 905_diplom_allnoch. pdf</source> <dc: creator>Barry Wellman, Jeffrey Boase, Kakuko Miyata</dc: creator> <dc: subject>Barry Wellman, Jeffrey Boase, Kakuko Miyata The Mobile-izing. . </dc: subject> <dc: title>Talk P. Bruzzone</dc: title> <dc: creator>Bruzzone </dc: creator> <dc: creator>Pierluigi</dc: creator> <dc: date>2004 -07 -05</dc: date> <dc: type>Review </dc: type><dc: identifier>http: //www. rbej. com/content/2/1/52 </dc: identifier> Reproductive Biology and Endocrinology 2004, 2: 52 doi: 10. 1186/1477 -7827 -2 -52
OAI Harvesting: Problems in Practice 2 BASE: Institutional Repositories - Variations of <dc: language> EN: 9910 ENG: 771 En: 566 Eng: 1 English: 24084 English (United States): 63 English and Greek: 1 English and Russian: 1 English/Japanese: 1 English; Russian: 1 English=en: 1 Translation into English: 2 en: 1279115 en-CA: 865 en-US: 3 en-es: 5 en-us: 8 en; : 2 en_UK: 618 en_US: 18456 eng: 186787 eng : 92 eng + dut: 2 eng; : 17 eng; fre; ger; : 141. .
BASE: Institutional Repositories Some Rules from Harvesting Practice § Standard repository software is great - for OAI harvesting as well § Small collections – small problems § Getting the related fulltext is complicated § Libraries produce better metadata § Writing e-mails helps - sometimes § Data aggregation may produce problems
BASE: Institutional Repositories Further Developments: BASE Interfaces § Search form (working) § HTTP calls (working) § Web Service (in development) § Federated Search (Vascoda) (in discussion)
BASE: Institutional Repositories Local Integration: Search Form <form action="http: //www. base-search. net/index. php" method="post" accept-charset="UTF-8"> <input maxlength="512" name="q" type="text" size="50" /> <input value="Search!" type="submit" /> <input value="all" name="s" type="hidden" /> </form>
BASE: Institutional Repositories Thank you!
- Slides: 22