Jukka Klem Salvatore Mele D 4 ScienceII KickOff

  • Slides: 36
Download presentation
Jukka Klem & Salvatore Mele | D 4 Science-II Kick-Off Meeting | Pisa 15

Jukka Klem & Salvatore Mele | D 4 Science-II Kick-Off Meeting | Pisa 15 Oct 2009

Who is INSPIRE? Where does INSPIRE come from? How does HEP communicate? What do

Who is INSPIRE? Where does INSPIRE come from? How does HEP communicate? What do scientists want? What is Invenio? Where does INSPIRE go? How do we go there together?

Who is INSPIRE? Where does INSPIRE come from? How does HEP communicate? What do

Who is INSPIRE? Where does INSPIRE come from? How does HEP communicate? What do scientists want? What is Invenio? Where does INSPIRE go? How do we go there together?

CERN: European Organization for Nuclear Research (since 1954) • • • World leading HEP

CERN: European Organization for Nuclear Research (since 1954) • • • World leading HEP laboratory, Geneva (CH) 2500 staff (mostly engineers, administrators/services) 9000 users (physicists from 580 institutes in 85 countries) 3 Nobel prizes (Accelerators, Detectors, Discoveries) Invented the web Ready to re-start the 27 -km (6 bn€) LHC accelerator, “the big-bang machine” • Top management committed to Open Access • Runs a 1 -million objects Digital Library CERN Convention (1953): ante-litteram Open Access manifesto “… the results of its experimental and theoretical work shall be published or otherwise made generally available”

INSPIRE team @ CERN Being Recruited (IT)– 100% (API, grid-ification) Jukka Klem (OA) –

INSPIRE team @ CERN Being Recruited (IT)– 100% (API, grid-ification) Jukka Klem (OA) – 80% (Applications) Jean-Yves le Meur (IT) – Infra supervision Tibor Šimko (IT) – Tech supervision Tim Smith (IT) – Infra strategy & MGA Salvatore Mele (OA) – Apps strategy & TB TBC: Junior developer (OA/IT) – (Interface

Who is INSPIRE ? Who are our buddies ? Which publishers do we talk

Who is INSPIRE ? Who are our buddies ? Which publishers do we talk to ? Durham. DESY ar. Xiv ADSElsevier Springer Fermilab CERN SISSA SLAC APS PDG KEK World Scientific

~15’ 000 High Energy Physics (HEP) scientists smash stuff at the speed of light

~15’ 000 High Energy Physics (HEP) scientists smash stuff at the speed of light to produce new stuff

~15’ 000 HEP theorists scratch their heads to make sense of all that stuff

~15’ 000 HEP theorists scratch their heads to make sense of all that stuff and then some more

Who is INSPIRE? Where does INSPIRE come from? How does HEP communicate? What do

Who is INSPIRE? Where does INSPIRE come from? How does HEP communicate? What do scientists want? What is Invenio? Where does INSPIRE go? How do we go there together?

The HEP “preprint culture” • • L. Goldschmidt-Clermont, 1965, http: //eprints. rclis. org/archive/00000445/02/communication_patterns. pdf

The HEP “preprint culture” • • L. Goldschmidt-Clermont, 1965, http: //eprints. rclis. org/archive/00000445/02/communication_patterns. pdf Scientific journals of ‘ 60 s too slow for HEP Mass-mail preprints to institutes worldwide Ante litteram (institute-pays) Open Access CERN library starts index and display preprints • Leading research libraries “serve” preprints CERN Library, circa 1960

Before e-mail and RSS. . . L. Addis, 2002, http: //www. slac. stanford. edu/spires/papers/history.

Before e-mail and RSS. . . L. Addis, 2002, http: //www. slac. stanford. edu/spires/papers/history. html • SLAC Library (Stanford) maintains preprint lists • Sending lists to subscribers worldwide as of ‘ 62 • Scientists then request preprints of interest • Published articles go on anti-preprint list • Indispensable working tool from ‘ 60 s to ‘ 80 s

SPIRES: first electronic catalogue • • • http: //www. slac. stanford. edu/spires/papers/history. html http:

SPIRES: first electronic catalogue • • • http: //www. slac. stanford. edu/spires/papers/history. html http: //www-conf. slac. stanford. edu/interlab 99/program/kunz/Early. Web. frame. pdf SLAC Library, 1974: now 750’ 000 records With Fermilab (US) and DESY (DE) Libraries Electronic catalogue of preprints metadata Updated with publication reference First terminal login, then e-mail interface Then the first web server in U. S. Date: Fri, 13 Dec 91 17: 55: 53 GMT+0100 From: timbl@nxoc 01. cern. ch (Tim Berners-Lee) Subject: WWW to SPIRES on SLACVM - Experimental To: www-interest@cernvax. cern. ch, www-talk@cernvax. cern. ch There is an experimental W 3 server for the SPIRES High energy Physics preprint database, thanks to Terry Hung, Paul Kunz and Louise Addis of SLAC. It's only just been put up, so don't expect perfection. With the w 3 line mode browser, follow a link to it from our home page, - Tim Paul Kunz wrote a few days ago: "The SLAC Library maintainer of SPIRES databases, Louise Addis, is absolutely delighted. She will ask for a permanent VM service machine and finish off the polishing. Things are really moving now. ”

ar. Xiv. org the archetypal repository http: //vmsstreamer 1. fnal. gov/VMS_Site_03/Lectures/Colloquium/presentations/090506 Ginsparg. pdf •

ar. Xiv. org the archetypal repository http: //vmsstreamer 1. fnal. gov/VMS_Site_03/Lectures/Colloquium/presentations/090506 Ginsparg. pdf • • P. Ginsparg, LANL, 1991. Now Cornell Library E-mail based, then immediately on the web No mandate, no debate, author-driven 1/2 Million preprints. Growing beyond HEP

Where do HEP scientists go for info? Gentil-Beccot et al. arxiv: 0804. 2701 •

Where do HEP scientists go for info? Gentil-Beccot et al. arxiv: 0804. 2701 • Survey of 2’ 000+ scientists (10% community) • Library/community answers to info needs • Google as proxy of ar. Xiv, SPIRES, publishers

Who is INSPIRE? Where does INSPIRE come from? How does HEP communicate? What do

Who is INSPIRE? Where does INSPIRE come from? How does HEP communicate? What do scientists want? What is Invenio? Where does INSPIRE go? How do we go there together?

What more do users want ? Gentil-Beccot et al. arxiv: 0804. 2701 Not imp

What more do users want ? Gentil-Beccot et al. arxiv: 0804. 2701 Not imp orta Ac ce De ss pt to f h ul Q o ua f l t lit cov ex t y e of ra co ge nt en t Very nt imp orta nt

Where do users see the systems go ? • • Gentil-Beccot et al. arxiv:

Where do users see the systems go ? • • Gentil-Beccot et al. arxiv: 0804. 2701 Seamless Open Access to pre-’ 90 s articles “Greyer” literature (laboratory reports) Conference slides (linked with articles) “Publication” of “ancillary” material: – Data behind tables, figures – Re-usable experimental data • Some sort of peer-review overlaid on ar. Xiv • “Smarter” search tools

What would users give ? Gentil-Beccot et al. arxiv: 0804. 2701 • Would users

What would users give ? Gentil-Beccot et al. arxiv: 0804. 2701 • Would users contribute to tag articles ? • Indexing and keywording in a Web 2. 0 world ! • Immense potential to be harnessed Fraction of answers Would contribute 30 minutes/week or more Would not contribute Seniority in the field

Who is INSPIRE? Where does INSPIRE come from? How does HEP communicate? What do

Who is INSPIRE? Where does INSPIRE come from? How does HEP communicate? What do scientists want? What is Invenio? Where does INSPIRE go? How do we go there together?

Who is INSPIRE? Where does INSPIRE come from? How does HEP communicate? What do

Who is INSPIRE? Where does INSPIRE come from? How does HEP communicate? What do scientists want? What is Invenio? Where does INSPIRE go? How do we go there together?

Building INSPIRE http: //www. projecthepinspire. net/ • Joint project of CERN, DESY, FERMILAB, SLAC

Building INSPIRE http: //www. projecthepinspire. net/ • Joint project of CERN, DESY, FERMILAB, SLAC • Switch off aging SPIRES infrastructure • Import 750’ 000+ records into an Invenio instance • Inherit 50’ 000+ users (60+ million searches/year) • Roll out 1 Q 10 (working on back-offices tools) • Out of the box: totally new back-office, • Bi-directional feeds with ar. Xiv and publishers

Releasing INSPIRE http: //www. projecthepinspire. net/ • • Medium term add-ons to INSPIRE (2

Releasing INSPIRE http: //www. projecthepinspire. net/ • • Medium term add-ons to INSPIRE (2 Q 10 -4 Q 10) Full-text searching warehouse, Open Access & Copyrighted Author disambiguation (algorithm & web 2. 0) Personal shelves, with annotations. Alerts Drop-box for old preprints, theses … (advocacy campaign) Widespread “drop”, describe and search non-text material User generated tags (taxonomic & à la Flickr) Thesaurus-based semantics, then folksonomy & ontology

Who is INSPIRE? Where does INSPIRE come from? How does HEP communicate? What do

Who is INSPIRE? Where does INSPIRE come from? How does HEP communicate? What do scientists want? What is Invenio? Where does INSPIRE go? How do we go there together?

Use computational power of e. Infrastructure to grow repository services 1. Back-office infrastructural services

Use computational power of e. Infrastructure to grow repository services 1. Back-office infrastructural services 2. Back-office content-analysis services 3. Novel front-line services

1. Back-office infrastructural services I. Parallelization of full-text indexing II. OCR’ing old holdings/new scanned

1. Back-office infrastructural services I. Parallelization of full-text indexing II. OCR’ing old holdings/new scanned submissions III. “Gorilla” classification of content IV. Text-mining for metadata and citation extraction

2. Back-office content-analysis services Clustering of “similar” records for I. Discovery (if you want

2. Back-office content-analysis services Clustering of “similar” records for I. Discovery (if you want this you might want that) II. Use Ranking (first result is what you want) citations, author network, tags, logs Nightly re-clustering holdings including daily updates: 1. User-generated tags 2. New additions with their metadata/citations/logs

3. Novel front-line services Reqs: Impossible without a Grid, but latency tolerant “Find me

3. Novel front-line services Reqs: Impossible without a Grid, but latency tolerant “Find me a mentor” User uploads A 4 -size research synopsis INSPIRE identifies appropriate mentor (or referee) Depends on success of parallel semantic project

Infra services OCR’ing Indexing parallelization Metadata extraction … Apps Infra SWORD API develop+maintain Maintain

Infra services OCR’ing Indexing parallelization Metadata extraction … Apps Infra SWORD API develop+maintain Maintain INSPIRE Clustering Find-me-a-mentor