Paolo Manghi Natalia Manola paolo manghiisti cnr it

  • Slides: 25
Download presentation
Paolo Manghi Natalia Manola paolo. manghi@isti. cnr. it natalia@di. uoa. gr The Open. AIRE

Paolo Manghi Natalia Manola paolo. manghi@isti. cnr. it natalia@di. uoa. gr The Open. AIRE Infrastructure On Measuring Research Impact - The EGI use-case -

Outline • The What and How of Open. AIRE • Supporting research communities •

Outline • The What and How of Open. AIRE • Supporting research communities • Contexts, categories and concepts • User input • Results and analytics • Looking Ahead Developing the Open Sience Commons - Sept 25, Amsterdam 2

Open. AIRE in a nutshell European data infrastructure for scholarly communication • Facilitating discovery

Open. AIRE in a nutshell European data infrastructure for scholarly communication • Facilitating discovery of research outcome across disciplines • Promotes & implements Open Access • Interlinks and contextualizes research outcomes • Integrates publication, data, software repositories, CRIS systems • Monitoring research outputs and measuring research impact • Open Access policy evaluation • Funding schemes: return of investment through impact • Research initiatives: research impact • Providing both human and technical infrastructure to make this possible! Developing the Open Sience Commons - Sept 25, Amsterdam 3

Deposit Publications & data Get support (NOADs) Visualize - Manage Enhanced Publications Curate &

Deposit Publications & data Get support (NOADs) Visualize - Manage Enhanced Publications Curate & collaborate Search & Browse Research impact Citations, usage statistics Linked Content +++ Statistics Services for Project Coordinators, Project Funders, Funders +++ Infrastructure coordination Guidelines for use services Datasets Authors Link Classify Publications Text Mine Data Providers Mine for other info De-duplicate. Enrich Projects Mine for project Guidelines for data interoperability Metadata Usage data And pdfs 8, 700, 000 OA publications 460 validated repositories Fully compliant? Institutional CRIS Systems Link Cite APIs De-duplicate Organizations EC funding Metadata on data Infrastructure: data. National sources funding Publication repositories Publishes in OA journal Institutional & Thematic Open Access Journals Deposits in institutional or thematic repository CERN/Open. AIRE “catch-all” repository Data repositories Publishes. Data data Journals

Added Value: Integrated Scientific Information System Datasets Authors Publications Data Providers Projects Organizations 8.

Added Value: Integrated Scientific Information System Datasets Authors Publications Data Providers Projects Organizations 8. 7 mi publications 7 mi authors 460+ data providers 90 K publications linked to projects 2 funders 700 datasets linked to publications 33 K organizations 2731 publications linked to EGI Research Communities Developing the Open Sience Commons - Sept 25, Amsterdam 5

BEHIND THE SCENES Developing the Open Sience Commons - Sept 25, Amsterdam 6

BEHIND THE SCENES Developing the Open Sience Commons - Sept 25, Amsterdam 6

Internal data flow Enriched Information Space Data Inference Open. AIRE Portal: Discovery & Impact

Internal data flow Enriched Information Space Data Inference Open. AIRE Portal: Discovery & Impact measure Inferring Off-line Human Data Curation Public Information Space De-duplication Native Information Space Data source import Harvesting End-user claims Developing the Open Sience Commons - Sept 25, Amsterdam 7

RESEARCH ANALYTICS Developing the Open Sience Commons - Sept 25, Amsterdam 8

RESEARCH ANALYTICS Developing the Open Sience Commons - Sept 25, Amsterdam 8

Monitoring OA policy Research Output Measures FP 7 66 K pubs – 7. 5

Monitoring OA policy Research Output Measures FP 7 66 K pubs – 7. 5 K projects FP 7 timeline - total FP 7 breakdowns Developing the Open Sience Commons - Sept 25, Amsterdam

Classification Text mining - Supervised techniques Developing the Open Sience Commons - Sept 25,

Classification Text mining - Supervised techniques Developing the Open Sience Commons - Sept 25, Amsterdam 10

Beyond the Obvious Text mining – Unsupervised techniques (topic modeling) Example 1 FP 7

Beyond the Obvious Text mining – Unsupervised techniques (topic modeling) Example 1 FP 7 programmes connected through scientific pubs Research Trends Structural effects Interactive graphs Providing overview Developing the Open Sience Commons - Sept 25, Amsterdam 11

Example 2 How FP 7 programme areas are related Developing the Open Sience Commons

Example 2 How FP 7 programme areas are related Developing the Open Sience Commons - Sept 25, Amsterdam 12

EGI & OPENAIRE 1 -year pilot ended in May 2014 Official service release: Oct

EGI & OPENAIRE 1 -year pilot ended in May 2014 Official service release: Oct 2014 @www. openaire. eu Developing the Open Sience Commons - Sept 25, Amsterdam 13

Supporting communities • Enriched Open. AIRE data model • Context (e. g. “EGI”) •

Supporting communities • Enriched Open. AIRE data model • Context (e. g. “EGI”) • Category (“Virtual Organizations”) • Concept (“alice”) • Text mining algorithms tailored to community needs, integrated into Open. AIRE text mining framework Developing the Open Sience Commons - Sept 25, Amsterdam 14

What Open. AIRE does Behind the scenes • Extract full text from publications •

What Open. AIRE does Behind the scenes • Extract full text from publications • if structured, use “funding” & “acknowledgements” fields • Scan text for matches against any of the EGI organization names provided • For each match, search surrounding context for • general terms & suggested acknowledgements (using word pairs) to add a confidence value to the match and eliminate false matches • For EC projects, we search not only for the project acronym (e. g. EGI-In. SPIRE) but also for the grant ID (261323) Developing the Open Sience Commons - Sept 25, Amsterdam 15

How to identify EGI Text mining on pdfs from repositories, publisher metadata Identify publications

How to identify EGI Text mining on pdfs from repositories, publisher metadata Identify publications associated to EGI in terms of • Associated to EGI projects • Publication “enabled. By EGI: XYZ” • Publication ”supported. By EGI: XYZ” • Associated to a certain Virtual Organisation (VO) or National GRID Infrastructures (NGI) • Publication "used EGI" • Publication "used NGI: XYZ" • Publication ”produced. By VO: XYZ” • Associated to a certain EGI scientific discipline • Publication "related to EGI Scientific Discipline: XYZ” Developing the Open Sience Commons - Sept 25, Amsterdam 16

What EGI community should do STEP 1 Use proper acknowledgement in the publication Organisation

What EGI community should do STEP 1 Use proper acknowledgement in the publication Organisation Name Type Grant ID Suggested Acknowledgement We. NMR EC Project 261572 "The We. NMR project (European FP 7 e-Infrastructure grant, contract no. 261572, www. wenmr. eu), supported by the European Grid Initiative (EGI) through the national GRID Initiatives of Belgium, France, Italy, Germany, the Netherlands (via the Dutch Bi. G Grid project), Portugal, Spain, UK, South Africa, Taiwan and the Latin America GRID infrastructure via the Gisela project is acknowledged for the use of web portals, computing and storage facilities. " and the following article describing the We. NMR portals should be cited: Wassenaar et al. (2012). We. NMR: Structural Biology on the Grid. J. Grid. Comp. , 10: 743 -767. EGI-In. SPIRE EC Project 261323 The authors acknowledge the use of resources provided by the European Grid Infrastructure. For more information, please reference the EGI-In. SPIRE paper (http: //go. egi. eu/pdnon). ALICE VO n/a The ALICE collaboration gratefully acknowledges the resources and support provided by all Grid centres and the Worldwide LHC Computing Grid (WLCG) collaboration. LHCb VO n/a The Tier 1 computing centres are supported by IN 2 P 3 (France), KIT and BMBF (Germany), INFN (Italy), NWO and SURF (The Netherlands), PIC (Spain), Grid. PP (United Kingdom). We are thankful for the computing resources put at our disposal by Yandex LLC (Russia), as well as to the communities behind the multiple open source software packages that we depend on. NGI: PT NGI n/a This work makes use of results produced with the support of the Portuguese National Grid Initiative. More information in https: //wiki. ncg. ingrid. pt Developing the Open Sience Commons - Sept 25, Amsterdam 17

What EGI community should do STEP 2 • Option 1: follow the Open. AIRE

What EGI community should do STEP 2 • Option 1: follow the Open. AIRE guides • Publish in OA journal or deposit in OA repository – preferably the Open. AIRE compatible ones for Open. AIRE 2. 0+ guidelines (i. e. , link to funding) • Option 2: use the Open. AIRE portal “claiming” service to associate • any publication (within Open. AIRE or not) to EGI • results to additional EGI information: VO, classification, relationship Developing the Open Sience Commons - Sept 25, Amsterdam 18

User Input Developing the Open Sience Commons - Sept 25, Amsterdam 19

User Input Developing the Open Sience Commons - Sept 25, Amsterdam 19

Developing the Open Sience Commons - Sept 25, Amsterdam 20

Developing the Open Sience Commons - Sept 25, Amsterdam 20

What does it look like Developing the Open Sience Commons - Sept 25, Amsterdam

What does it look like Developing the Open Sience Commons - Sept 25, Amsterdam 21

Aggregated statistics Developing the Open Sience Commons - Sept 25, Amsterdam 22

Aggregated statistics Developing the Open Sience Commons - Sept 25, Amsterdam 22

Lessons learned & Best practices • Mandates on how to write acknowledgements are crucial

Lessons learned & Best practices • Mandates on how to write acknowledgements are crucial but often missing • Try to collect as much information that may help with the mining beforehand. Even information that you may not think that it'll help, it may prove useful in the end. • Clean and normalize your input data (character encoding, stop-word removal, character case, special characters, etc. ). • Design your data mining methods to be very tolerant. In our case, suggested acknowledgements never appeared exactly as given in the input texts. • Do manual curation of the results to tune your data mining methods. Yes it is very labor intensive, but without it you'll be blind to your mistakes. • Design and implement your data processing methods to work in a streamed fashion and to be performant. Streamed design solves the “data bigger than memory” problem, performance design solves the “having to wait one week for results” problem. Developing the Open Sience Commons - Sept 25, Amsterdam 23

Roadmap • Release • Results of inference visible from the portal • Claim user

Roadmap • Release • Results of inference visible from the portal • Claim user interfaces available from the portal • Plan • Production release – ready by 1 st of October 2013 • Add more communities (e. g. , FET) Developing the Open Sience Commons - Sept 25, Amsterdam 24

Thank you! Looking forward to your questions and feedback www. openaire. eu @openaire_eu facebook.

Thank you! Looking forward to your questions and feedback www. openaire. eu @openaire_eu facebook. com/groups/openaire linkedin. com/groups/Open. AIRE-3893548 paolo. manghi@isti. cnr. it Developing the Open Sience Commons - Sept 25, Amsterdam 25