WWW GBIF ORG GLOBAL BIODIVERSITY INFORMATION FACILITY Information
WWW. GBIF. ORG GLOBAL BIODIVERSITY INFORMATION FACILITY Information architecture of the GBIF Hannu Saarenmaa IABIN/CHM Cancún, Mexico, 12 -14 August 2003 Global Biodiversity Information Facility
Outline 1. 2. 3. 4. 5. Data Software Hardware Peopleware (Nodes) Status of network and conclusion Global Biodiversity Information Facility
l Policy, decisions, l Knowledge aly sis , s ynt he si s 1. Data l Information em en t, an and Re fin depend on l Data Global Biodiversity Information Facility
GBIF is concerned with ”primary biodiversity data” only Specimens l Observations l Names l l Species Literature l Metadata on the above l Global Biodiversity Information Facility
How the data will be organised ? l By having a common information model and shared data standards Rights: Services: Source: URL Protocol: SOAP, Di. GIR Format: XML Schema Description: Datasets Rights: Format: Specimen data Units/ Records Observation data Institutions Data sources Taxonomies in ECAT and Catalogue of Life Checklists Redlists Knowledge Bases Objects Central Distributed Species Unstructured Knowledge information Global Biodiversity Information Facility
Data exchange standards are the key Data description in XML l Specimen/ Observation l Name/ Taxon l Providers / Collections / Persons in various roles Leading standards l l l Di. GIR Darwin Core ABCD/Bio. CASE Dublin Core SOAP Grid OGSA Standards process l l l GBIF-DADI works with TDWG Discussion, documentation Open source l digir. sourceforge. net Global Biodiversity Information Facility
2. Software GBIF is buidling a distributed network of databases using a web services approach Global Biodiversity Information Facility
Web Services: Definitions v. A Web Service is a software application or component identified by a URI, whose interfaces and bindings are capable of being described by standard XML vocabularies and that supports direct interactions with other software applications or components through the exchange of information that is expressed in terms of an XML infoset via Internet-based protocols. - Chris Ferris, Sun Microsystems, W 3 C Global Biodiversity Information Facility
The Web Services Stack Di. GIR, Global Biodiversity Information Facility
2. 1. The l l l Used for communication between data providers and users More light-weight and specialised than SOAP Enables single point of access (portal/search) to distributed information resources l l l Resource: a collection of data objects that conform to a common schema (DB records, XML documents) Distributed resources conform to federation schema Enables search & retrieval of structured data l l protocol Search for data values in context (semantics) Results as structured data set Makes location and technical characteristics of native resource transparent to the user The Distributed Generic Information Retrieval protocol has been invented by David Vieglais (University of Kansas) and Stan Blum (California Academy of Sciences) Global Biodiversity Information Facility
A simple Di. GIR architecture Portal Search engine Data Providers Global Biodiversity Information Facility
GBIF Di. GIR Architecture ( UDDI ) Provider query Registry Institutions Providers Services Metadata and name query Index Available providers Metadata response Metadata and logs Synonyms, guids Name provider Di. GIR SOAP Provider Services Resource Metadata Portal Request Marshaller Cache Accounting Publish availability User Full data query Query Engine Full data response Data provider Provider Services Resource Metadata Global Biodiversity Information Facility
2. 2. The registry You don’t get very far with web services unless you have a registry. . . ” -Tom Gaskins, uddi. org l Global marketplace of shared biodiversity data l Technically available now, awaits being populated l Multiple UDDI servers possible in 2004 (v 3) Based on UDDI (Systinet WASP) and web services l l l Directory of Participants and data providers Services of the providers, i. e. , datasources and datasets offered t. Models of the standards that must be adhered to Open interfaces for portals and specialised search engines l l Anybody can write their portal/search tool that uses the registry Use of index is optional Global Biodiversity Information Facility
How does the GBIF UDDI registry work? G B I F 1) GBIF Secretariat and other developers create and populate the registry with descriptions of standards (t. Models) 6) Scientists and policy users use portals to build data sets for analysis and synthesis GBIF UDDI Registry Provider Registrations 2) Museums and other data providers install data provider packages which are automatically registered Services Registrations 3) GBIF Participant is notified of new provider in their domain, possible endorsement 5) Portals and search engines query the registry and the index to build tageted user interfaces 4) A global index queries the registry and caches metadata and usage statistics, creating unique identifier for each record (and name) Global Biodiversity Information Facility
2. 3. Metadata and names index l Closely paired with the services registry will be a global index of the available data Retrieves metadata of datasets/resources available in the registered providers l Indexes on scope and coverage of datasets/resource (Dublin Core registry) l l Taxonomic, l spatial, temporal, . . . Maintains a cache of key data in case provider goes off-line Global Biodiversity Information Facility
Name Service (ECAT) is a major component of the global index GBIF Portal XML Data Access HTML Data Access ECAT elements have been coloured orange: “Name Lists” are lists of names for a specific purpose (e. g. Red List, regional checklist) Biodiversity Data Access GBIF Data Nodes Specimen Data Observation Data Name Lists Unstructured Data URLs Index Manager Indexing of usage Name Usage Index Taxonomic Name Service (ECAT) Catalogue of Life Global Biodiversity Information Facility
2. 4. a. Data provider software l Each system entails l l l l Provider software Communication with the Di. GIR protocol Data standards Darwin Core, Dublin Core Installation for each provider Configuration for each resource (local existing database) Registration with GBIF UDDI registry Turn-key package for Linux and Windows l l Based on PHP and digir. sourceforge. net code Available in August 2003 Global Biodiversity Information Facility
2. 4. b. Data repository tool l A data warehouse tool to manage and share data without database l l Upload and manage datasets in document format either as a) spreadsheet, b) embedded Darwin Core, or c) ABCD Release dataset to public l l Revoke release l l Data is parsed into embedded My. SQL database and becomers available as Di. GIR resource Data is deleted from database Stand-alone package or module of GBIF PTK l l For Linux and Windows Based on Python and Zope, available Q 3/2003 Global Biodiversity Information Facility
2. 5. Logging and accounting l l Track the usage of the network and document the data provided by the nodes. Why? l l l Recognise the efforts of the data providers Help the users to acknowledge the sources of the data they are using Report back to the Participants whether the GBIF network is really used Optimise network performance and services How? l l l Willing data providers log their transactions Central accounting service downloads logs, providing statistics of usage and a citation service on the web site and via email Part of the Index Global Biodiversity Information Facility
2. 6. Portals l l Portals are gateways to distributed information resources You do not need your own portal in order to become data provider l l Just access to one that talks to a registry Anybody can write their specialised portal/search tool that uses the registry and the index through their open interfaces (Di. GIR, SOAP) l l The MANIS portal is available now (Java) GBIF Portal Toolkit v 2 that can be used to access data planned for availability Q 1/2004 Global Biodiversity Information Facility
Two roles of portals l Communication/ coordination needs l l l Portals are integrative tools and gateways to information that go beyond single websites Portals and related directory services can be used to coordinate network activities Data access needs l l l Much of the content on the portals can be built automatically out of contents of the central Index GBIF central portal is only one of many portals and search engines making use of the central metadata registry and related index through their open interfaces Participant nodes need portals to data in their domain Global Biodiversity Information Facility
GBIF Portal Toolkit Communications portal (version 1) released at the end of 2002, and as portal toolkit (PTK) for use by nodes l l l News syndication with RSS/RDF Events, calendar of calendars, projects Articles, documents, images, audio and video content Search within the site, across the GBIF network Download area Getting started service and how to become a node About GBIF CIRCA-based group collaboration services Directory services (CIRCA-based open LDAP) Suggestions and feedback from users Prototype data repository Data access portal (version 2) Q 1/2004, l l Registry Access to primary biodiversity data derived from the central index Accounting service of use of data Links to Participant nodes and their content Global Biodiversity Information Facility
Test version of the central GBIF communications portal Global Biodiversity Information Facility
3. Hardware l Each Participant should have on Internet, alternatively, or both: A network of distributed data providers l A central data warehouse l l At least one server and an Internet connection that are stable l Can be hosted elsewhere, if stablity is problem Global Biodiversity Information Facility
4. Peopleware How to become a GBIF data provider? Data is provided by the nodes. Global Biodiversity Information Facility
GBIF node responsibilities 1. 2. 3. 4. Network Registry Standards Tools GBIF Registry, Index, and Portal 1. 2. Data Node Coordination Network Registry Standards Tools Consolidated Data Identify Data Nodes Endorse and quality assure data nodes National Language Interfaces 3. 1. 2. 3. 4. 5. 6. Register metadata Allow indexing 1. 2. Participant Node Portal Encourage participation Manage registration of Data Nodes Global Biodiversity Information Facility
NODES coordinate their Participant networks l The NODES Committee l l l NODES are in key position in promotion and helping of inclusion of new data providers and data sets l l l Comprises the managers of the Participant nodes Works with the Information and Communications Technology (ICT) staff of the Secretariat to develop the network of nodes Building of data network requires building of a human network Maintains global directory of people, roles, data providers Sharing the best practices, experiences and ideas and share software tools Global Biodiversity Information Facility
What tools Participant node needs l Registry tools to endorse institutions and data providers l l Directory of people, collections, institutions and related communication tools Portal server for domain-specific website l l l Access to the central UDDI registry Local directory server or UDDI server National language support as needed Data warehouse to host data from the willing/unable data nodes Tools for quality assurance Global Biodiversity Information Facility
Training l Training programme is being shaped l 7 regional workshops in 2003 on ”Becoming a GBIF data provider” l l Stockholm, Ottawa, Tsukuba, Lisbon, San Jose, Africa, ”francophonie” Secretariat only works with the Participant nodes, therefore: l l l ”Train the trainer” concept Certification of a cadre of trainers Standardised tools and materials Global Biodiversity Information Facility
Helpdesk For all operational services l Ticket handling, followup l Will be geographically distributed l For ”GBIF-approved packages” l Global Biodiversity Information Facility
Why would I share my data? l Identity of each record will be maintained l Globally unique identifier (LSID/URN) l l l Comparable to authorship of names Usage will be logged and statistics provided l l Network: Provider: Namespace: Key: Version, E. g. GBIF-LSID: mysite. org: Specimen. ID: 123456: 1 The efforts of the data providers will be recognised Users required to acknowledge the sources of the data they are using Users will be informed who is using their data (difficult without authentication) Could be required for publication (cf. Gen. Bank) ”GBIF Public Licence” Global Biodiversity Information Facility
GBIF IPR Principles l l Ø GBIF will seek to ensure that data in GBIF-affiliated databases is in public domain l In particular data enabling linking with other data l GBIF will seek to ensure that source of data is acknowledged by all users l Cf. Open Source licenses, commons Maintenance and control of data remain in hands of database owners l There will be no central data banks (except caches) l Database owners can block access to sensitive data l Countries have sovereignity over their biological resources It follows that GBIF services will mainly be integrative metadata services, and standards Global Biodiversity Information Facility
Conclusion Global Biodiversity Information Facility
GBIF as a global integrator Global Biodiversity Information Facility
GBIF network status l l l l NODES committee set its goal to have a Di. GIR network up and running by end of 2003 Seven regional workshops and training events Two Di. GIR provider implementations available August 2003 UDDI registry up and running July 2003 Global index Q 4/2003 Portal to browse and search data Q 4/2003, toolkit Q 1/2004 Specialised services such as BIODI GARP service emerging Global Biodiversity Information Facility
SUMMARY l Central registry and marketplace of distributed data l Anyone can build their vertical portals or specilised search engines on top of that Participant nodes: Major role in coordination and dissemination, quality assurance l Data nodes: Register your datasets, provide online access to database or repository l Data remains under the control of providers l Data standards and web services make it work l Global Biodiversity Information Facility
- Slides: 36