OAI and Metadata Harvesting Mukesh Pund Principal Scientist
- Slides: 48
OAI and Metadata Harvesting Mukesh Pund Principal Scientist, NISCAIR New Delhi
Acknowledgements u u While preparing this presentation, I have used material from several sources on OAI-PMH by other authors I gratefully acknowledge these sources
Digital Repositories: Current Situation u u u Mushrooming number and variety of distributed digital repositories (archives, digital libraries) Use of variety of hardware, software, database solutions Use of different search and retrieval interfaces Most of the content are not indexed by web search engines Content resides in backend databases – not picked up by web search engines
Problems faced by Users u u u How to identify and retrieve relevant information from different repositories? Visiting and searching individual repositories is very expensive Key Requirement: How do we support cross searching?
Current Solutions u u Federated/ distributed searching – Z 39. 50 Information Retrieval protocol Metadata harvesting – OAI-PMH protocol
Federated/ distributed searching u u Protocol: "Information Retrieval (Z 39. 50): Application Service Definition and Protocol Specification", (ISO/ ANSI standard) (v 1 -1991, v 2 -1992, v 3 -1995) Client-Server model (TCP/IP Service) Process: – Client (‘Origin’) sends queries, formatted according to Z 39. 50, to repository Server (“Target”). – Server translates this to local query format, searches the database, sends the results to the client, formatted according to Z 39. 50 – Client translates the results and presents it to the user Client can send queries to as many related z 39. 50 compliant servers as possible
Z 39. 50 protocol … u u u Example implementation: Distributed searching of library catalogues/ bibliographic databases Problem - performance – Implementation not easy – Does not scale well (if nodes > 100) – Network bandwidth – Z 39. 50 implementation at client (“Origin’) end Z 30. 50 resources: http: //lcweb. loc. gov/z 3950/agency/ (Z 39. 50 International Maintenance Agency, Library of Congress)
OAI-PMH Vs. Z 39. 50 u u OAI-PMH: Indexed Search much similar to general search Engines. Requires Service Providers and data providers Z 39. 50: Concurrent Search, No service providers only data providers 8
OAI-PMH u u Open Archive Initiative-Protocol for Metadata Harvesting Protocol Version 2. 0 of 2002 -06 -14 http: //www. openarchives. org
Open Archives Initiative (OAI) The protocol is openly documented, and metadata is “exposed” to at least some peer group (note: rights management can still apply!) Archive defined as a dynamic “collection of stuff” -- not the archivist’s definition of “archive”. “Repository” used in most OAI documents. OAI is happening at break-neck speed. . .
Metadata Harvesting u u u Move away from distributed searching (e. g. , Z 39. 50) Extract metadata from various sources Build services on local copies of metadata – Resources remain at remote repositories user individual nodes can still support direct user interaction metadata harvested offline Search all searching, browsing, etc. performed on the metadata here local copy of metadata harvested offline . . . each node independently maintained
Data and Service Providers u u u Data Provider – Creators and keepers of the metadata as well as repositories of resources – Give free access of metadata (not necessarily: free access to full texts / resources) Service Provider – Harvest and store metadata (no live requests!) – May select certain subsets from Data Providers (set hierarchy, date stamp) for selective harvesting – May enrich metadata – Offer (value-added) service on the basis of the metadata One ‘service’ can play both roles (Aggregators)
Multiple Data and Service Providers Data providers Harvesting based on OAI-PMH Service providers
Aggregators Data providers Aggregator Service providers
OAI-PMH v. 2. 0 [06/2002] u u u u u Low-barrier interoperability specification Metadata harvesting model: data provider / service provider Metadata about resources Autonomous protocol Not a search protocol! HTTP based XML responses Unqualified Dublin Core Stable: backward compatible
OAI Data Model: Resources / Items / Records resource item = identifier Dublin Core metadata all available metadata about Mona Lisa MARC metadata SPECTRUM metadata item records record = identifier + metadata format + datestamp
Harvesting: How it works Six OAI “Verbs” Identify List. Metadata. Formats List. Sets List. Identifiers List. Records Get. Record Service Provider Metadata Provider R H E HTTP Request A P (OAI Verb) R O V S E OAI I S T T O HTTP Response E R (Valid XML) R Y
Harvester u u A harvester is a client application that issues OAIPMH requests. A harvester is operated by a service provider as a means of collecting metadata from repositories . 18
Repository u u A repository is a network accessible server that can process the OAI-PMH requests. A repository is managed by a data provider to expose metadata to harvesters 19
Resource u A resource is the object or "stuff" that metadata is "about". The nature of a resource, whether it is physical or digital, or whether it is stored in the repository or is a constituent of another database, is outside the scope of the OAI-PMH 20
Item u u An item is a constituent of a repository from which metadata about a resource can be disseminated. That metadata may be disseminated on-the-fly from the associated resource, cross-walked from some canonical form, actually stored in the repository, etc. 21
Record u u A record is metadata in a specific metadata format. A record is returned as an XML-encoded byte stream in response to a protocol request to disseminate a specific metadata format from a constituent item. 22
Unique Identifier u A unique identifier unambiguously identifies an item within a repository u The unique identifier is used in OAI-PMH requests for extracting metadata from the item. cont… 23
Unique Identifier u The format of the unique identifier must correspond to that of the URI (Uniform Resource Identifier) syntax u Repositories may implement the oai-identifier 24
Role of Identifier u Unique identifiers play two roles in the protocol: u Response: Identifiers are returned by both the List. Identifiers and List. Records requests. Request: An identifier, in combination with a metadata. Prefix , is used in the Get. Record request as a means of requesting a record in a specific metadata format from an item u 25
OAI-PMH Verbs u u u Identify List. Sets List. Metadata. Formats List. Idenfiers Get. Record List. Records
Identify u Returns general information about the: u Archive and its policies u Datestamp u Granularity Ex: http: //192. 168. 0. 12/dspace-oai/request? verb=Identify u 27
28
List. Sets u u u Provide a listing of sets in which records may be organized (may be hierarchical, overlapping, or flat) Example: http: //192. 168. 0. 12/dspaceoai/reqeust? verb=List. Sets 29
30
List. Metadata. Formats u u u Lists metadata formats supported by the archive as well as their schema locations and namespaces Example: http: //192. 168. 0. 12/dspaceoai/request? verb=List. Metadata. Formats 31
32
List. Identifiers u u List headers for all items corresponding to the specified parameters http: //192. 168. 0. 12/dspaceoai/request? verb=List. Identifiers&metadata. Prefix=o ai_dc 33
34
Get. Record u u u Returns the metadata for a single item in the form of an OAI record Example: http: //192. 168. 0. 12/oai/request? verb= Get. Record&identifier=oai: 192. 168. 0. 12: 123456789/3&m etadata. Prefix=oai_dc 35
08/24/07
List. Records u u Retrieves metadata records for multiple items http: //192. 168. 0. 12/dspaceoai/request? verb=List. Records&metadata. Prefix=oai_ dc 37
38
List. Identifiers u u To get a list of identifiers http: //192. 168. 0. 12/oai/request? verb=List. Identifier s&metadata. Prefix=oai_dc&from=2002 -12 -01 39
40
Selective Harvesting u u u By date &from=2002 -12 -01 OR &from=2002 -12 -01&until=2003 -12 -01 By set (collection in Dspace) &set=hdl_1849_2 41
OAI-PMH user interface for Dspace – 5. 0 42
Useful Sites u u u OAI-PMH Official Site: – http: //www. openarchives. org/ Testing your OAI-PMH compatibility – http: //oai. dlib. vt. edu/cgi-bin/Explorer/2. 01. 45/testoai Registering your Digital Repository – http: //www. openarchives. org/data/registeraspro vider. html 43
OAI Service Provider Software (Harvesters) u u u PKP Harvester: – University of British Columbia, Canada – http: //www. pkp. ubc. ca/pkp-harvester/ DLESE – Digital Library for Earth System Education – http: //sourceforge. net/projects/dlese-oai/ ARC – Old Dominion University, Virginia – http: //arc. cs. odu. edu/ 44
OAI Data Provider Software u u OAICat – OCLC – http: //www. oclc. org/research/software/oai/cat. htm DLESE – Digital Library for Earth System Education – http: //sourceforge. net/projects/dlese-oai dfs 45
How do base. URLs look like u DSpace repositories – NSDL : 202. 54. 99. 9/dspace – http: //202. 54. 99. 9/dspace-oai/request 46
OAI Tools u http: //www. openarchives. org/tools. html 47
Thank You 48
- Phaidra
- Oai cell
- Oaisim
- Free 5gc
- Oai-ore
- Selenium mukesh otwani
- Mukesh png text
- Interview questions on selenium cucumber framework
- Think big think fast
- Mukesh singla
- Site:slidetodoc.com
- Difference between collection and harvesting
- Harvesting and marketing of fish
- Rain water harvesting
- Metadata encoding and transmission standard (mets)
- Floral formula annona squamosa floral diagram
- Metadata and taxonomy
- Bubonic plague
- "operation roi"
- Harvesting strategy
- Future of rainwater harvesting
- Autonomous data harvesting
- Rainwater harvesting calgary
- Rainwater storage tanks ireland
- Bibliography for rainwater harvesting
- Inovasi dan inovatif adalah
- Harvesting methods
- Harvesting the venture
- Direct harvesting meaning
- Outcome harvesting training
- Droofs
- Function of rainwater harvesting
- Hypothesis of rainwater harvesting
- The stages of cellular respiration
- Coarse mesh in rainwater harvesting
- Rainwater harvesting introduction
- Sod harvesting
- Corkglass
- Rain water harvesting autocad drawing
- Life cycle assessment of rainwater harvesting
- Conclusion of rain water harvesting
- The primary harvesting machine used for field crops is the:
- Poznaska
- Outcome harvesting methodology
- Negarim micro catchment
- Chapter 9 cellular respiration harvesting chemical energy
- Chapter 9: cellular respiration: harvesting chemical energy
- Electron transport chain summary
- A scientist performs an experiment, and an actor performs a