ELIXIR Authentication and Authorization Infrastructure Requirements Beurs van
ELIXIR: Authentication and Authorization Infrastructure Requirements Beurs van Berlage Amsterdam 14 -17 th September, 2010 Andrew Lyall www. elixir-europe. org @emblebi ELIXIR: a sustainable infrastructure for biological information in Europe.
ELIXIR in one slide 1. Three year € 3 Million ESFRI BMS PP Project Consortium of 32 European organisations co-ordinated by EMBL-EBI 2. An e-Infrastructure with a current user base within Europe of at least one million (unique IP addresses accessing EMBL-EBI) 3. Provides the infrastructure to enable the utilisation of the Human Genome and all the other molecules of life, related data and services 4. Used in basic life sciences, agricultural, environmental, pharmaceutical and many other areas of research 5. ELIXIR was created to deploy these data in order to help to address the European Grand Challenges 6. Emerging disruptive technologies mean that there will be at least a 1000 fold increase in data over the next decade 7. Most of the data are available without authentication – the main exception is data relating to individual patients & volunteers ELIXIR: a sustainable infrastructure for biological information in Europe. 2
ELIXIR is coordinated by EMBL-EBI • • • The European Bioinformatics Institute is an outstation of the European Molecular Biology Laboratory EMBL is an International organisation created by treaty (cf CERN, ESA) EMBL-EBI has 400 Staff, € 30 Million Budget, several million users 15 year history of service provision and scientific excellence Sited at the Wellcome Trust Genome Campus Hinxton, Cambridge, UK after European competition 2008 funding sources ELIXIR: a sustainable infrastructure for biological information in Europe.
Req/Day ELIXIR: a sustainable infrastructure for biological information in Europe.
Comprehensive, universal, integrated… • • • Life sciences Genomes Medicine Genomes Ensembl, , Ensembl Genomes, Agriculture Genomes, EGA Pharmaceuticals Geneexpression Array. Express Biotechnology Proteinfamilies, Environment motifsand anddomains Inter. Pro Bio-fuels Proteininteractions Int. Act Cosmaceuticals Int. Act Neutraceuticals Consumer products Personal genomes Etc… Literatureand andontologies Cit. Explore, GO GO Nucleotidesequence EMBL-Bank Proteomes Uni. Prot, PRIDE Proteinstructure PDBe Chemicalentities Ch. EBI, Ch. EMBL Pathways Reactome Systems Bio. Models ELIXIR: a sustainable infrastructure for biological information in Europe.
BMS Support of the European Grand Challenges ELIXIR will provide Infrastructure for the other ESFRI BMS RI. ELIXIR: a sustainable infrastructure for biological information in Europe.
Growth of disk storage at EMBL-EBI Ten petabytes at May 2010. ELIXIR: a sustainable infrastructure for biological information in Europe.
Q 1 How are users authenticated? • They aren’t. . . • Users access the data collections anonymously via web pages, web services and download • Except for a very few who need to access data that could be used to identify an individual (personal data) • However, the number of users needing to do this is on the increase • The problems are not technical, they are political, ethical and societal ELIXIR: a sustainable infrastructure for biological information in Europe. 8
How are users of personal data authenticated? • On a case by case basis • Each data-set has a custodian • The custodian specifies a mechanism for certification and authorisation and an oversight procedure • Often certification and permission will be handled by a Data Access Committee or an Ethics Committee • A variety of different mechanisms are used – conceptually, each data manager or custodian will decides for themselves • Not all sensitive data are equally sensitive so it is likely that there will be different mechanisms even from the same source • ELIXIR will need to pass these mechanisms through unchanged • It will probably be necessary to audit access ELIXIR: a sustainable infrastructure for biological information in Europe. 9
Q 3. Which types of resources are in use? • A typical resource will integrate expert data-curation, RDBMS access, high-bandwidth networking and high-performance storage • Some resource may also integrate HPC for an “algorithmic” step • The resource will be accessed without authentication (except for personal data) • The are perhaps twenty resources located at EMBL-EBI that account for 80% of the total usage • These are deployed using EMBL-EBI’s private grid • They are made accessible without authorisation through a web page, via web services or by download of the data set. • The remaining 20% of usage is accounted for by many hundreds of resources all around Europe which are deployed using similar means – many of these are important in their own right. ELIXIR: a sustainable infrastructure for biological information in Europe. 10
Q 4 Where will we be in five years time? • Most data will be accessed without authentication • Commercial or public domain grids and/or clouds may be doing the deployment rather than privately owned ones • Modest progress will have been made on the political, ethical and societal problems associated with accessing personal data • More investigators will be accessing personal data, probably by adhoc means • Some of the technology currently under development will have been useful in providing these solutions ELIXIR: a sustainable infrastructure for biological information in Europe. 11
Q 5 Are the users happy? • Authentication is unacceptable to many users • The user base is very diverse and there is considerable organisational complexity • Those who need to access personal data usually understand the issues and are prepared to accept the limitations that this imposes • Open access is crucial to the progress of biology and medicine ELIXIR: a sustainable infrastructure for biological information in Europe. 12
Example of organisational complexity Hospital *** Patient Notes Case Report Form (CRF) Clinical Research Centre ** Clinical Scientist EPR System Study DMS Statistics System Clinical DMS Hospital laboratory University Images LIMS System Pathology Tissue bank Public domain Key = Identifiable = Deidentified Research System ELIXIR: a sustainable infrastructure for biological information in Europe. * Instrument System
Example of stakeholder complexity • • Clinical Researcher Research Nurses Surgeons Pathologists Scientists Technicians Statisticians Bioinformaticians • …and the patients! ELIXIR: a sustainable infrastructure for biological information in Europe.
- Slides: 14