Programmatic Interaction with Open Access Repositories Roberto Barbera
Programmatic Interaction with Open Access Repositories Roberto Barbera – University of Catania – Italy (roberto. barbera@ct. infn. it) WACREN e-Research Hackfest – Lagos (Nigeria) This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement n° 654237
Outline • • • 2 Part 1 • Introduction: definitions and context Part 2 • Manually resource upload by submit interface • Programmatic interaction with an Open Access Repository using APIs for data • Searching • Downloading • Uploading • MARCXML tags overview • Programmatic interaction with an Open Access Repository using the OAI-PMH-standard protocol Part 3 • Get authorship of research products
Part 1 3
The Scientific Method G. Galilei • Examples of IR: • Classical Mechanics • Newton’s Gravitation Theory • Examples of DR: • General Relativity • Standard Model of Particle Physics 4
The “output” of the Scientific Method Marked a real Scientific Revolution but… it is the same since almost 4 centuries! 5
The Pillars of the Scientific Method • Repeatability • The closeness of agreement between independent results obtained with the same method on identical test material, under the same conditions (same operator, same apparatus, same laboratory and after short intervals of time) • Affected by random errors n e i sc a e r ce ? e l b i c u d ro p e r lly • Reproducibility • The closeness of agreement between independent results obtained with the same method on identical test material but under different conditions (different operators, different apparatus, different laboratories and/or after different intervals of time) • Affected by systematic errors 6 Is
Challenges in irreproducible research (http: //www. nature. com/nature/focus/reproducibility/index. html) 7
The “reproducibility crisis” Out of 18 microarray papers, results from 10 could not be reproduced 1. 2. 3. Ioannidis et al. , 2009. Repeatability of published microarray gene expression analyses. Nature Genetics 41: 14 Science publishing: The trouble with retractions http: //www. nature. com/news/2011/111005/full/478026 a. html 8 Bjorn Brembs: Open Access and the looming crisis in science https: //theconversation. com/open-access-and-the-looming-crisisin-science-14950
Repeatability and Reproducibility are not all… 9
Evolution of Distributed Computing Cost of hw Cost of networks 80’s-90’s Cluster Computing e Tim 90’s-00’s Grid Computing 00’s-10’s Cloud Computing Mainframe Computing Power of COTS WAN bandwidth 10 10
e-Science Virtual Research Communities e-Infrastructure Applications Data Instruments/sensors 11
How do e-Infrastructures support the Scientific Method? Data Infrastructures Open Access Doc. Repos. Data Repos. Ch 12 lk» a w « a Semantic-web enrichment of linked data : e g n alle e th s s o cr p e g d le w o n k Data preservation HTC/HPC Clusters Grids, Clouds ys a w th o b ath
Open Science (definitions) (http: //book. openingscience. org and http: //dx. doi. org/10. 1787/5 jrs 2 f 963 zs 1 -en) � “Open Science refers to a scientific culture that is characterized by its openness. Scientists share results almost immediately and with a very wide audience” � “Open science is a means and not an end in itself and it is much more than just open access to publications or data; it includes many aspects and stages of research processes thus enabling full reproducibility and re-usability of scientific results. ” 13
Open Science (tools) 14
The European Open Science Cloud and the 3 Os of the EC (http: //ec. europa. eu/research/openscience/index. cfm? pg=open-science-cloud, https: //goo. gl/6 Nm 39 H) 15
Some global “connections” le b i s i v e « r o m s t s i t n e n ica r f A e ak m : e ng ci s / e c scien Challe rt Oppo ru t s a r f it e-In lo p x e : nity u ca i r f A in e c n e n Sci e p O e t omo r : p n o i s i V 16 t i o d s to e r u t c »
The Dakar Declaration on Open Science in Africa 17 www. sci-gaia. eu/dakar-declaration/
The Sci-Ga. IA Federated Platform for an Open Science Commons in Africa www. sci-gaia. eu/osp/ 18
The Knowledge Workflow 19
Concepts and definitions (Source: Wikipedia) • Open Access repositories are powered by Digital Asset Management Systems (DAMSes), which are “intertwined structures incorporating both software and hardware that take care of management tasks and decisions surrounding the ingestion, annotation, cataloguing, storage, retrieval and distribution of digital assets” • A digital asset in essence is “anything that exists in a binary format and comes with the right to use” • “Types of digital assets include, but are not exclusive to, photography, logos, illustrations, animations, audio-visual media, presentations, spreadsheets, Word and/or PDF documents, data and a multitude of other digital formats and their respective metadata” 20
Some of the most common DAMSes DAMS CKAN Home page http: //ckan. org/ License Free CONTENTdm http: //www. oclc. org/contentdm. en. html Commercial http: //digitalcommons. bepress. com/ Commercial (hosted service) Digibib Digital Commons Di. VA-Portal d. Libra Drupal DSpace Earmas EPrints http: //www. exlibrisgroup. com/category/ Digi. Tool. Overview http: //www. diva-portal. org http: //dingo. psnc. pl/dlibra/ https: //www. drupal. org/ http: //www. dspace. org/ http: //www. earmas. net/ http: //www. eprints. org/software/ EQUELLA Repository http: //www. equella. com/ Digi. Tool My. Co. Re http: //scholar. lib. vt. edu/ETDdb/index. shtml http: //www. fedora-commons. org/ http: //apsr. anu. edu. au/currentprojects/f ez 06. htm http: //www. greenstone. org/ https: //hal. archives-ouvertes. fr/ http: //invenio-software. org/ http: //islandora. ca/ http: //www. intrallect. com/solutions/man aging_content/ http: //www. mycore. de/ Open Repository http: //www. openrepository. com/ ETD-db Fedora Fez Greenstone HAL Invenio Islandora/Fedora intra. Library Sci. ELO VITAL WEKO http: //www. kobv. de/entwicklung/softwar e/opus-4/ https: //www. standrews. ac. uk/staff/research/pure/ http: //scielo. org/php/index. php https: //www. iii. com/products/vital http: //weko. wou. edu. my Xoo. NIps http: //xoops. org/modules/repository/ OPUS PURE 21 Commercial Free (hosted service) Commercial Free Free Free (hosted service) Free Commercial (hosted service) Free (hosted service) Commercial Free Others, more business or social oriented, are listed at www. capterra. com/digital-asset-management-software/
Sci-Ga. IA Task 3. 1: Support the creation of federated and interoperable Open Access Document and Data Repositories in Africa, compliant with EU and other international guidelines • Planned activities: • • • 22 Identification of already existing Open Access Document and Data Repositories in the region and inclusion in web based directories such as Open. DOAR and the CHAIN-REDS Knowledge Base Promotion of the Open Access Initiative (OAI) standards and of the Open. AIRE guidelines to make contents (both papers and data) stored on the African repositories more discoverable, searchable and hence visible worldwide Federation, through the use of Linked Data standards and Semantic Web technologies, of African Open Access Document and Data Repositories and to make them accessible and searchable from a unique entry point included in the project website Feasibility study for the creation of a pilot service to issue Persistent Identifiers (PIDs) compliant with the Handle System to be associated to documents and data Provision of a ready-to-install-and-configure appliance to quickly build and populate Open Access Repositories compliant with OAI, Open. DOAR and Ope. AIRE standards/guidelines
The Sci-Ga. IA Open Access Repository • Requirements: • • Open source Distributed under a free license Deployable on a local infrastructure (i. e. , not a hosted service) Standard compliant Well supported Scalable, up to O(106) – O(107) resources (to begin with) Choice: Invenio (latest stable version: v 1. 2. 1 + Sci-Ga. IA add-ons) Motivations: • • • 23 Fully compliant with all most important library standards, e. g. DCMI, Marc 21 and OAI-PMH; Co-developed by an international collaboration comprising institutes such as CERN, DESY, EPFL, FNAL, SLAC and used as institutional repository by about 30 scientific institutions worldwide; INSPIRE, SCOAP 3 and ZENODO (the Open. AIRE flagship archive) repositories are based on Invenio; The CERN Document Server operates since 2002 and manages about 1. 3 million records; UNESCO and UEMOA are leading an initiative to create a virtual library based on Invenio in 8 African countries (Benin, Burkina Faso, Côte d’Ivoire, Guinea Bissau, Mali, Niger, Senegal and Togo).
The Sci-Ga. IA Open Access Repository (http: //oar. sci-gaia. eu/) Resources can be: Ø Manually uploaded Ø Automatically harvested and ingested from external sources Ø The possibility to mint Data. Cite Digital Object Identifiers (DOIs) and assign them to the records stored in the OAR Ø If existing, direct links to the altmetrics of each of the records contained in the OAR Ø The correct metadata structure and the right OAI-PMH endpoint configuration to make the OAR compliant with version 3. 0 of the Open. AIRE Guidelines 24 feder authe ated nticat ion Sci-Ga. IA add-ons to Invenio:
Compliance with standards (Full conforming with Open Archive Initiative’s standards & registered as an Open. DOAR data provider) 25
The Knowledge Workflow First demonstrated @ ICT 2015 26
Research packages 27
The Sci-Ga. IA OAR itself as a research package 6 clones of the Sci-Ga. IA OAR are being deployed, both in Africa and Europe 28
Part 2 29
Submit a resource 30
31
32
Image submit • • 33 Item 1 Item 2
34
35
36
Programmatic Interaction with an (Invenio-based) Open Access Repository Search Engine API There are three kind of APIs you can use: • XML API • JSON API • Python API 37
Programmatic Interaction XML API • Syntax: • GET /search? param 1=value 1¶m 2=value 2¶m 3=value 3… • Example: • 38 Get the first 10 records in XML format http: //oar. sci-gaia. eu/search? jrec=1&rg=10&of=xm where • jrec= jump to record ID (e. g. 1 for first hit) • rg=records-in-group-of (e. g. 10 hits per page) • of= output format (e. g. Xm for XML format)
Programmatic Interaction XML API Set ‘jrec’ and ‘rg’ appropriately to paginate the output • Example: http: //oar. sci-gaia. eu/search? of=xm&jrec=1&rg=10 http: //oar. sci-gaia. eu/search? of=xm&jrec=11&rg=10 http: //oar. sci-gaia. eu/search? of=xm&jrec=22&rg=10 Do not set “rg” to high – there is a server-wide safety limit for it 39
Programmatic Interaction XML API • Example: • Get the first 10 records that contains the string “Sci-Ga. IA Winter School” in the title: http: //oar. sci-gaia. eu/search? p=Sci. Ga. IA%20 Winter%20 School&f=title&jrec=0&rg=10&of=xm where: • p=pattern (e. g. your query) • f= field to search within (e. g. “title”, “athors”. . ) • Get a record from a given DOI http: //oar. sci-gaia. eu/search? p=doi: 10. 15169/scigaia: 1466352420. 24&of=xm • Get all records uploaded from a given date (e. g. 2016 -03 -21) to another given date (e. g. today) http: //oar. sci-gaia. eu/search? of=xm&d 1=2016 -0321&d 2=2016 -07 -05 where • d 1=first date YYYY-mm-dd format • d 2=second date YYYY-mm-dd format 40
Output of : http: //oar. sci-gaia. eu/search? of=xm&d 1=2016 -0321&d 2=2016 -07 -05 41
Programmatic Interaction JSON API You can ask for JSON output format “of=recjson” to obtain it Use the same parameters as XML API • Example: • Get a record from a DOI: http: //oar. sci-gaia. eu/search? p=doi: 10. 15169/scigaia: 1466352420. 24&of=recjson • Get all records uploaded from a given date (e. g. 2016 -03 -21) to another given date (e. g. today): http: //oar. sci-gaia. eu/search? d 1=2016 -03 -21&d 2=2016 -07 -05&of=recjson where • d 1=first date YYYY-mm-dd format • d 2=second date YYYY-mm-dd format 42
Output of : http: //oar. sci-gaia. eu/search? of=recjson&d 1=2016 -0321&d 2=2016 -07 -05 43
Programmatic Interaction JSON API • Example: • 44 Get only the abstract, title and authors of resources: http: //oar. sci-gaia. eu/search? of=recjson&ot=abstract, title, authors where ot=output tags (e. g. ‘’ to get all fields, ‘title’ to get titles only)
Programmatic Interaction Python API Invenio Search Engine can be called from within your Python programs via both a high-level and low-level API interface. Use the same parameters as XML and JSON API To know more about Python, XML and JSON API visit this guide: http: //oar. sci-gaia. eu/help/hacking/search-engine-api 45
Programmatic Interaction Download records We need: • PUBLIC KEY • PRIVATE KEY • SIGNATURE • Provided by the system We have to calculate Calculate signature: myquery= http: //oar. sci-gaia. eu/search? apikey=PUBLICKEY&jrec=0&rg=10&of=xm Signature=HMAC-SHA 1(myquery, Private-Key) http: //oar. sci-gaia. eu/search? apikey=PUBLICKEY&jrec=0&of=xm&rg 10&signature=SIGNATURE 46
Programmatic Interaction Upload records We have to: • Send an authorisathion request for your IP address to admin@sci-gaia. eu • Create a MARCXML file as input (e. g. your_file. xml) • Example: curl –T your_file. xml http: //oar. scigaia. eu/batchuploader/robotupload/insert -A invenio_webupload -H “Content-Type: application/marcxml+xml” To know more about Upload: http: //oar. sci-gaia. eu/help/admin/bibupload-adminguide#2 47
YOUR_FILE. XML MARC format is the standard in the library world <? xml version="1. 0" encoding="UTF-8"? > <collection xmlns="http: //www. loc. gov/MARC 21/slim"> <record xmlns="http: //www. loc. gov/MARC 21/slim"> </record> … </collection> 48
your_file. xml <? xml version="1. 0" encoding="UTF-8"? > <collection xmlns="http: //www. loc. gov/MARC 21/slim"> <record xmlns="http: //www. loc. gov/MARC 21/slim"> <datafield tag=“ " ind 1=" " ind 2=" "> <subfield code=“”></subfield> …. . </datafield> …… </record> </collection> 49
your_file. xml <? xml version="1. 0" encoding="UTF-8"? > <collection xmlns="http: //www. loc. gov/MARC 21/slim"> <record xmlns="http: //www. loc. gov/MARC 21/slim"> <datafield tag="024" ind 1="7" ind 2=" "> <subfield code="a">DOI identifier</subfield> <subfield code="2">Type of identifier</subfield> </datafield> <datafield tag="100" ind 1=" " ind 2=" "> <subfield code="a">First author</subfield> <subfield code="v">Affiliation</subfield> <subfield code="w">Country</subfield> <subfield code="j">orcid</subfield> </datafield> …… </record> </collection> 50
your_file. xml To know more about MARCXML tags http: //oar. scigaia. eu/help/admin/howto-marc 51
Programmatic interaction Search Engine based on the OAI-PMH-standard protocol • The Sci-Ga. IA OAR OAI-PMH endpoint is publicly available at: http: //oar. sci-gaia. eu/oai 2 d • Get detailed information about the available sets: http: //oar. sci-gaia. eu/oai 2 d? verb=Identify • Get the list of Dublic Core records: http: //oar. scigaia. eu/oai 2 d? verb=List. Records&metadata. Prefix=oai_dc • Get a record from the oai-identifier: http: //oar. sci-gaia. eu/oai 2 d? verb=Get. Record&identifier=oai: oar. scigaia. eu: 5&metadata. Prefix=oai_dc 52
Output of : http: //oar. sci-gaia. eu/oai 2 d? verb=Get. Record&identifier=oai: oar. scigaia. eu: 8&metadata. Prefix=oai_dc 53
Part 3 54
“Who’s this science of? ” ? ts c u d ro p h c r sea p o t ow H 55 a e d i rov re o t ip h s r utho
ORCID (www. orcid. org – becoming a “de facto” standard) More than 2. 7 million ORCID IDs so far 56
Digital Object Identifiers Thanks to UNICT, the Sci-Ga. IA OAR has an official prefix of: Unlimited numbers of sub-prefixes/DOIs can be created/minted 57 ® All records in the OAR can be “claimed” in the ORCID profiles of their authors
Authorship of research products with OAR and ORCID (www. orcid. org) 58
Altmetrics (www. altmetrics. com) 59 ® The Sci-Ga. IA OAR automatically links its records to their altmetrics
Thank you! sci-gaia. eu info@sci-gaia. eu
References • DAMS introduction • • Data. Cite • • http: //www. orcid. org Sci-Ga. IA OAR Installation and configuration guide • 61 https: //www. openarchives. org/OAI/openarchivesprotocol. html ORCID • • https: //en. wikipedia. org/wiki/MARC_standards OAI-PMH • • http: //invenio-software. org/ Marc 21 • • http: //dublincore. org/ Invenio (software and documentation) • • http: //www. datacite. org Dublin Core • • http: //www. sci-gaia. eu/osp-oar/ http: //oar-sci-gaia. readthedocs. io/en/latest/
- Slides: 61