EUDAT CKAN Heinrich Widmann widmanndkrz de EUDAT The

  • Slides: 9
Download presentation
EUDAT & CKAN Heinrich Widmann widmann@dkrz. de

EUDAT & CKAN Heinrich Widmann widmann@dkrz. de

EUDAT The project European Data Infrastructure (EUDAT http: //eudat. eu ) Motivation : Manage

EUDAT The project European Data Infrastructure (EUDAT http: //eudat. eu ) Motivation : Manage the rising tide of research data Improve Interoperability in a wide cross-disciplinary scope Objective : Build up a Collaborate Data Infrastructure, based on common data services ( https: //eudat. eu/services) driven by requirements of the research communities

B 2 FIND the metadata service of EUDAT (info+doc https: //eudat. eu/services/b 2 find

B 2 FIND the metadata service of EUDAT (info+doc https: //eudat. eu/services/b 2 find ) based on a comprehensive joint metadata catalogue of research data collections stored in EUDAT data centres and other (external) repositories provides a powerful and user-friendly discovery portal http: //b 2 find. eudat. eu on metadata covering a wide range of research cross-discipline communities

Used Technologies • Cent. OS 6 (productive instance) • Modular Ingestion Workflow 1. Harvesting

Used Technologies • Cent. OS 6 (productive instance) • Modular Ingestion Workflow 1. Harvesting : OAI-PMH (but as well support of JSONAPI etc. ) 2. Own Mapping Module (+ community specific md schemas and ontologies, closed vocabs, …) 3. Upload to CKAN : + common B 2 FIND MD schema, lot of additional facets (extra fields) • Apache + Varnish 3 Cache + CKAN Version 2. 2. 3 with

CKAN extensions • ckanext-b 2 find (+ b 2 find facets, legal pages etc.

CKAN extensions • ckanext-b 2 find (+ b 2 find facets, legal pages etc. ) • ckanext-spatial (supported by ckan !, but compatibility issues (fixed) ) • ckanext-timeline (own development for ‚Temporal coverage‘ on different time scales => makes the usibility quite complex) – (how) can be added to supported CKAN extentions ? – ‚commitment‘ by CKAN for support and maintanance ? – Others interested in further development of this extension ? • ckanext-datesearch (Publication. Year) • Planned : Support of more extensions, e. g. – Use potential of sematic web/LOD ( + dcat, sparql, rdf) – Recombinant ? ? , Kettle ? ? , …. – Improve web appearance : (+ elastic search, …)

Issues • Scalability / Performance (mostly Postgres related) – Status : > 450000 records

Issues • Scalability / Performance (mostly Postgres related) – Status : > 450000 records harvested – Upload / indices (re-index lasts > 3 days !) – Download / search (esp. When access on PG-DB) – Delete (purge!) datasets (often not removed completely from DB+SOLR) • Upgrade to newer CKAN versions – Compatibility of ckan extentions (spatial, temporal) – Compatibility to own schema • Decouple upload and serach – Two SOLR indices (one ‚read only‘, one ‚write and update‘) ?

Issues (cont. ) • History ( - how to get rid of it (in

Issues (cont. ) • History ( - how to get rid of it (in Post. Gres) • Something like ‚paster clean history‘ ? • Support of Taxonomies/Hierarchies for facets (hierarichal tree of (sub-)disciplines)

Outlook • More records from more communities (will this scale with > 1 or

Outlook • More records from more communities (will this scale with > 1 or 10 millions records ? ) • Use tools as Kibana and elasticsearch to provide statistics on the fly in dashboard • Community customisation (switch between different SOLR cores and adapted search facets) • Further search/dessiminate funtionality : annotations, SRU interface

Links • Docs : https: //eudat. eu/services/userdoc/b 2 find • Portal : http: //b

Links • Docs : https: //eudat. eu/services/userdoc/b 2 find • Portal : http: //b 2 find. eudat. eu • Sourcecode : https: //github. com/EUDAT-B 2 FIND • Support : https: //eudat. eu/support-request • Contact : widmann@dkrz. de