Plateforme de Calcul pour les Sciences du Vivant

  • Slides: 28
Download presentation
Plateforme de Calcul pour les Sciences du Vivant Grids, a new way to do

Plateforme de Calcul pour les Sciences du Vivant Grids, a new way to do science V. Breton CNRS-IN 2 P 3 http: //clrpcsv. in 2 p 3. fr

What is Do Son ACGRID school about ? Plateforme de Calcul pour les Sciences

What is Do Son ACGRID school about ? Plateforme de Calcul pour les Sciences du Vivant • The school is about grids – Grids of PC clusters: EGEE tutorial from Nov. 5 th to 9 th – Desktop grids: BOINC tutorial on Nov 15 th • The school is about computational tools that use the grid – For data analysis: ROOT on Nov. 12 th and TAVERNA on Nov. 13 th – For simulation: GEANT 4 on Nov. 14 th • The school will consist of courses and hands-on – A Grid has been deployed locally at IOIT for the duration of the school

Our goals for the school Plateforme de Calcul pour les Sciences du Vivant •

Our goals for the school Plateforme de Calcul pour les Sciences du Vivant • Train asian engineers to install and operate grid services – Tutorial on grid installation (October 29 th – Nov. 2 nd) • Train asian researchers to use the services offered by the EGEE grid – Train users to call the grid services – Train users to deploy analysis and simulation tools which take advantage of the grid • Deploy in Vietnam a grid infrastructure researchers can use – Machines bought for the school will be distributed in 5 sites § § IOIT in Hanoi and HCMC Hanoi University of Technology Maison des Sciences et Technologies Institut Français d’Informatique

What is the Grid? Plateforme de Calcul pour les Sciences du Vivant • The

What is the Grid? Plateforme de Calcul pour les Sciences du Vivant • The World Wide Web provides seamless access to information that is stored in many millions of different geographical locations • In contrast, the Grid is a new computing infrastructure which provides seamless access to computing power, data and other resources distributed over the globe • The name Grid is chosen by analogy with the electric power grid: plug-in to computing power without worrying where it comes from, like a toaster

Two kinds of grids Plateforme de Calcul pour les Sciences du Vivant Volunteer computing

Two kinds of grids Plateforme de Calcul pour les Sciences du Vivant Volunteer computing vs grid infrastructures BOINC tutorial on Nov. 15 th EGEE grid tutorial Nov 5 -9

What is driving grid development? Plateforme de Calcul pour les Sciences du Vivant Data

What is driving grid development? Plateforme de Calcul pour les Sciences du Vivant Data and compute intensive sciences are next generation applications that have extreme needs but are likely to become mainstream in the next 5 years • Natural Resources and the Environment • Physics/Astronomy (data from different kinds of • Bioinformatics (study of the human genome and • Medical/Healthcare (imaging, diagnosis and treatment ) • Nanotechnology (design of new (weather forecasting, earth observation, modeling and prediction of complex systems: river floods and earthquake simulation) research instruments) proteome to understand genetic diseases) materials from the molecular scale) • Engineering (design optimization, simulation, failure analysis and remote Instrument access and control)

Meteorology Plateforme de Calcul pour les Sciences du Vivant • Necessity for early warning

Meteorology Plateforme de Calcul pour les Sciences du Vivant • Necessity for early warning and detection system for e. g. hurricanes • Technology advances at fast speeds: – Infrared sensors on meteorological satellites now provide more and more detailed observations of the atmosphere – Research efforts continue the development of computer forecasting models capable of utilizing satellite data to improve current weather-predicting skills – Meteorological studies are aided by the use of large computers for atmospheric modeling • With easier and faster access to data and models, prediction becomes continually more efficient

Earth Observation Plateforme de Calcul pour les Sciences du Vivant • Long-term global observations

Earth Observation Plateforme de Calcul pour les Sciences du Vivant • Long-term global observations of the land surface, biosphere, solid Earth, atmosphere, and oceans produce huge amounts of data: – not in homogeneous data formats – not easy to locate – no obvious user friendly interface • Challenge: understanding the Earth as an integrated system – increased scope and more local details means ever more data – to better understand the interrelations of different components one needs more analysing power – this translates into better forecasting

Climate Simulation Plateforme de Calcul pour les Sciences du Vivant • Climate simulation already

Climate Simulation Plateforme de Calcul pour les Sciences du Vivant • Climate simulation already uses distributed computing – Example: the scientific experiment “Casino-21” tries to produce a forecast of the climate in the 21 st century by a large-scale simulation – “Casino-21” uses a structure like the SETI@home project • Grid infrastructures will provide new and more powerful ways of using distributed computing for the use of Climate Simulation

Pollution Plateforme de Calcul pour les Sciences du Vivant • Satellite monitoring: – helps

Pollution Plateforme de Calcul pour les Sciences du Vivant • Satellite monitoring: – helps scientists to understand changes in the atmosphere, track them and plan ways to reduce our environmental impact • A wide variety of emissions is changing the chemistry and composition of our planet's atmosphere • The atmosphere is a very complex chemical system • So far data is used selectively – Increased analysing power gives access to a wider spectrum and optimizes turn-around times

The Vision Plateforme de Calcul pour les Sciences du Vivant • An international network

The Vision Plateforme de Calcul pour les Sciences du Vivant • An international network of scientists will be able to model a new flood of the Mekong river in real time, using meteorological and geological data from several centres across Europe • UNOSAT: – internet based service to provide high quality maps to UN agencies, NGOs and other institutions of the humanitarian community – Grid technology allows raw satellite images to be reduced and processed into readable maps at a greater speed than would otherwise be possible Access to a production quality grid will change the way science and earth observation of all kinds are done

How does the grid work? Plateforme de Calcul pour les Sciences du Vivant •

How does the grid work? Plateforme de Calcul pour les Sciences du Vivant • The Grid relies on advanced software, called middleware, which ensures seamless communication between different computers and different parts of the world • The Grid search engine not only finds the data the scientist needs, but also the data processing techniques and the computing power to carry them out • It distributes the computing task to wherever in the world there is available capacity, and sends the result back to the scientist

Grid Challenges Plateforme de Calcul pour les Sciences du Vivant • • • Share

Grid Challenges Plateforme de Calcul pour les Sciences du Vivant • • • Share data between thousands of scientists with multiple interests – Need to support dynamic virtual organisations of geographically dispersed groups Ensure all data is accessible anywhere, anytime – Peta-byte range of data needs to be available on-demand Grow rapidly, yet remain reliable for more than a decade – Are we sure the current technologies will scale? – Transfer to industry to achieve economies of scale Standardisation process still on-going – Merge of web-services (OASIS) and grids (GGF) into WSRF – Must progress to avoid non-compatible proprietary grids Cope with different management policies of grid sites – Link computer centres, not just single PCs, separately administered and owned – Need resource allocation policies and billing systems Ensure security – Medical applications have legal/ethical restrictions on data access – Avoid becoming a target for hackers

What is EGEE ? Plateforme de Calcul pour les Sciences du Vivant • EGEE

What is EGEE ? Plateforme de Calcul pour les Sciences du Vivant • EGEE – 1 April 2004 – 31 March 2006 – 71 partners in 27 countries, federated in regional Grids • EGEE-II – 1 April 2006 – 31 March 2008 – 91 partners in 32 countries – 13 Federations • Objectives – Large-scale, production-quality infrastructure for e-Science – Attracting new resources and users from industry as well as science – Maintain and further improve “g. Lite” Grid middleware

Why did we choose to teach you about EGEE? Plateforme de Calcul pour les

Why did we choose to teach you about EGEE? Plateforme de Calcul pour les Sciences du Vivant • EGEE is an operational grid infrastructure – More than 100000 jobs / day • EGEE offers real services to its user communities – Job and data management services are operational • EGEE Infrastructure is used to analyze LHC data – Joining EGEE allows participating to LHC data analysis • EGEE technology is well supported in Asia – Academia Sinica in Taiwan offers central services to user communities around Asia

What does EGEE provide? Plateforme de Calcul pour les Sciences du Vivant • Simplified

What does EGEE provide? Plateforme de Calcul pour les Sciences du Vivant • Simplified access (access to all the operational resources the user needs) • On demand computing (fast access to resources by allocating them efficiently) • Pervasive access (accessible from any geographic location) • Large scale resources (of a scale that no single computer centre can provide) • Sharing of software and data (in a transparent way) • Improved support (use the expertise of all partners to offer in-depth support for all key applications)

Highlights of EGEE-II Plateforme de Calcul pour les Sciences du Vivant • >200 VOs

Highlights of EGEE-II Plateforme de Calcul pour les Sciences du Vivant • >200 VOs from several scientific domains – Astronomy & Astrophysics – Civil Protection – Computational Chemistry – Comp. Fluid Dynamics – Computer Science/Tools – Condensed Matter Physics – Earth Sciences – Fusion – High Energy Physics – Life Sciences • Further applications under evaluation Applications have moved from testing to routine and daily usage ~80 -90% efficiency 98 k jobs/day

EGEE-II middleware Plateforme de Calcul pour les Sciences du Vivant • EGEE maintains and

EGEE-II middleware Plateforme de Calcul pour les Sciences du Vivant • EGEE maintains and improves the g. Lite middleware distribution LCG-2 2004 g. Lite prototyping • g. Lite 3 prototyping – Publicly released on May 4, 2006 – Convergence with LCG-2 – Currently deploying version 3. 1 § On Scientific Linux • • • product 2005 product Work management system Data management system Information system Resource brokering 2006 g. Lite 3. 0 Security

Operations Plateforme de Calcul pour les Sciences du Vivant Size of the infrastructure today:

Operations Plateforme de Calcul pour les Sciences du Vivant Size of the infrastructure today: • 237 sites in 45 countries • ~36 000 CPU • ~ 5 PB disk, + tape MSS • distributed operations • copes well with increase in size and usage 98 k jobs/day EGEE Network Sites Support Units Users NRENs GGUS ENOC GÉANT 2

Applications Plateforme de Calcul pour les Sciences du Vivant VO CPU Consumption Total VOs:

Applications Plateforme de Calcul pour les Sciences du Vivant VO CPU Consumption Total VOs: 204 Total Users: 5034 Affected People: 10200

The pilot applications Plateforme de Calcul pour les Sciences du Vivant – High Energy

The pilot applications Plateforme de Calcul pour les Sciences du Vivant – High Energy Physics with LHC Computing Grid (www. cern. ch/lcg) relies on a Grid infrastructure to store and analyse petabytes of real and simulated data. LCG is a major source of resources, requirements and a hard deadlines with no conventional solution available – In Biomedical Sciences, several communities are facing equally daunting challenges to cope with the flood of bioinformatics and healthcare data. Need to access large and distributed nonhomogeneous data and important ondemand computing requirements

LCG Plateforme de Calcul pour les Sciences du Vivant • LCG: a collaboration of

LCG Plateforme de Calcul pour les Sciences du Vivant • LCG: a collaboration of – The LHC experiments – The Regional Computing Centres – Physics institutes • Mission: – Prepare and deploy the computing environment that will be used by the experiments to analyse the LHC data • Strategy: – Integrate thousands of computers at dozens of participating institutes worldwide into a global computing resource – Rely on software being developed in advanced grid technology projects, both in Europe and in the USA

WISDOM Plateforme de Calcul pour les Sciences du Vivant • WISDOM: a collaboration of

WISDOM Plateforme de Calcul pour les Sciences du Vivant • WISDOM: a collaboration of – Biology, Bioinformatics, Chemoinformatics laboratories – Grid infrastructure projects • Mission: – in silico drug discovery against emerging and neglected diseases • Strategy: – Centuries of CPU cycles used to dock millions of compounds during large scale grid deployments – Secure data management of biochemical information

Dissemination and Training Plateforme de Calcul pour les Sciences du Vivant www. eu-egee. org

Dissemination and Training Plateforme de Calcul pour les Sciences du Vivant www. eu-egee. org 8000 7000 6000 5000 Unique visitors Links from Internet Search Engines 4000 3000 2000 ar y ar ch M ar y br u Fe r Ja nu be be ec em em D ob er ov N be r ct O em gu st pt Se Ju ly Au ay Ju ne M Ap ril 0 r 1000 • Comprehensive training programme in Europe, South America, Asia • 110 events, > 1600 participants ACGRID is one of these events

What is Do Son ACGRID school about ? Plateforme de Calcul pour les Sciences

What is Do Son ACGRID school about ? Plateforme de Calcul pour les Sciences du Vivant • Grids are about sharing – Resources (CPU, storage) – Knowledge • Do Son ACGRID school is about sharing knowledge – Sharing expertise in the installation and operation of grid services – Sharing expertise in the development of deployment of grid-enabled applications • Do Son ACGRID school is about building for long term collaboration – We are here to help Vietnamese engineers to run grid services – We are here to help vietnamese scientists to develop and deploy grid- enabled applications – We are here to present performing tools for data analysis and simulation • TAKE ADVANTAGE OF THIS OPPORTUNITY TO ADVANCE YOUR RESEARCH – ask questions – Don’t hesitate to discuss with teachers

What should happen after the school ? Plateforme de Calcul pour les Sciences du

What should happen after the school ? Plateforme de Calcul pour les Sciences du Vivant • Grid services will be installed in several sites in Vietnam – In Hanoi: Hanoi University of Technology, IOIT, Institut Français d’Informatique – In HCMC: IOIT • You will be able to use your grid certificates to access the EGEE grid through these sites – Possibility to join any other Virtual Organization • You will benefit from the grid services as any other EGEE user

What you get out of the school Plateforme de Calcul pour les Sciences du

What you get out of the school Plateforme de Calcul pour les Sciences du Vivant • Grids offer a unique opportunity to integrate research laboratories into international initiatives – Example: LHC • Grids offer opportunities to start collaboration – Example: Telemedecine § Installation of a grid enabled medical imaging platform at IOIT in HCMC § Joint application deployment between the platforms in HCMC and Clermont-Ferrand It all depends on you !

Credits Plateforme de Calcul pour les Sciences du Vivant • IOIT in Hanoi: Vu

Credits Plateforme de Calcul pour les Sciences du Vivant • IOIT in Hanoi: Vu Duc Thy, Luong Chi Mai, Ngo Tran Anh and collaborators • IOIT in HCMC: Do Van Long • ASGC: Min Tsai, Jinny Chen and collaborators • Nicolas Maire, Sébastien Incerti, René Brun, Georgina Moulton, our second week speakers • Health. Grid: Nicolas Spalinger, Nathanaël Verhaeghe • CNRS office in Hanoi: Bernard Mely, Le Tuyet Trinh • CNRS-IN 2 P 3: Vincent Bloch, Vincent Breton, Géraldine Fettahi, Matthieu Reichstadt, Denis Perret-Gallix, Jean Salzemann • TEIN 2: David West