The Swiss Grid Initiative Peter Kunszt Manager Swiss

  • Slides: 46
Download presentation
The Swiss. Grid Initiative Peter Kunszt Manager Swiss Grid Initiative EGEE Summer School Budapest,

The Swiss. Grid Initiative Peter Kunszt Manager Swiss Grid Initiative EGEE Summer School Budapest, July, 2006

Peter Kunszt Doctorate in Theoretical Physics from the University of Bern Building the Science

Peter Kunszt Doctorate in Theoretical Physics from the University of Bern Building the Science Database of the Sloan Digital Sky Survey, Johns Hopkins University Baltimore EU Grid Projects, leading data management middleware development CERN, Geneva Manager Swiss Grid Initiative, Swiss National Supercomputing Centre CSCS Manno PMB, 12_07_2006, P. Kunszt

CSCS PMB, 12_07_2006, P. Kunszt

CSCS PMB, 12_07_2006, P. Kunszt

Content Swiss Grid Initiative Swiss Involvements in Grid Projects – challenges § EGEE §

Content Swiss Grid Initiative Swiss Involvements in Grid Projects – challenges § EGEE § Swiss Bio Grid § SEPAC § Intelligent Scheduling System ISS Importance of Grids in General § Beyond the Hype § Strategies for Successful Grids Importance of National Grids § Why is it necessary to have a national Grid § Advantages and Disadvantages of participating in large projects PMB, 12_07_2006, P. Kunszt

Grid Computing in Switzerland – High Level Goals Resource Sharing: Pooling of Available Resources

Grid Computing in Switzerland – High Level Goals Resource Sharing: Pooling of Available Resources §Excellent national network provided by national research network provider SWITCH §Optimal usage of national resources §Pooling of available resources at research institutions §Harvesting cycles on as of yet unused resources (e. g. classroom PCs, cluster backfill queues) PMB, 12_07_2006, P. Kunszt

Grid Computing in Switzerland – High Level Goals Coordination: Building an Infrastructure §Agreements on

Grid Computing in Switzerland – High Level Goals Coordination: Building an Infrastructure §Agreements on the usage of the available resources §Coordinated support of the resources §Sharing of tools and middleware PMB, 12_07_2006, P. Kunszt

Grid Computing in Switzerland – High Level Goals Collaboration: Enabling Scientific Discovery §Coordinated application

Grid Computing in Switzerland – High Level Goals Collaboration: Enabling Scientific Discovery §Coordinated application usage, thematic Grids §Building a community §Establishing a joint knowledge base PMB, 12_07_2006, P. Kunszt

Swiss Grid Initiative Taking care of coordinating and supporting national Grid projects. • Point

Swiss Grid Initiative Taking care of coordinating and supporting national Grid projects. • Point of contact for all Grid Projects • Point of support for all Grid users and administrators • Representation of Swiss Academic Research Interests • In Europe • Globally • Towards the Industry PMB, 12_07_2006, P. Kunszt

The Swiss Grid Initiative has been created to • Provide support and expertise for

The Swiss Grid Initiative has been created to • Provide support and expertise for the Swiss research community • Promote connectivity and collaboration between disciplines and users, especially CS and ‘high-need’ applications • Represent the interests of the national research community towards other national and EU Grid projects • Get involved in joint multinational projects, help Swiss partners to get funding • Interact with the industry in joint projects • Continuously initiate thematic projects, including e-Science pilot studies • Research and develop middleware components to fill gaps and to improve the services to the community and with the community PMB, 12_07_2006, P. Kunszt

Swiss Grid Initiative Focus Support the End-User Enabling Relevant Scientific Discovery from Day 1

Swiss Grid Initiative Focus Support the End-User Enabling Relevant Scientific Discovery from Day 1 (no testbeds) Consulting about Gridification – not every project is suitable for the High Throughput Paradigm Seek new opportunities and initiate new projects PMB, 12_07_2006, P. Kunszt

Content Swiss Grid Initiative Swiss Involvements in Grid Projects – challenges § EGEE §

Content Swiss Grid Initiative Swiss Involvements in Grid Projects – challenges § EGEE § Swiss Bio Grid § SEPAC § Intelligent Scheduling System ISS Importance of Grids in General § Beyond the Hype § Strategies for Successful Grids Importance of National Grids § Why is it necessary to have a national Grid § Advantages and Disadvantages of participating in large projects PMB, 12_07_2006, P. Kunszt

EGEE and LCG SEE PREVIOUS PRESENTATION FOR DETAILS European Grid Infrastructure for Enabling E-science

EGEE and LCG SEE PREVIOUS PRESENTATION FOR DETAILS European Grid Infrastructure for Enabling E-science Teaming up with the D-Grid in the DECH Federation PMB, 12_07_2006, P. Kunszt

Tier-ed model Lab m Uni x grid for a regional group Lab a Tier

Tier-ed model Lab m Uni x grid for a regional group Lab a Tier 3 physics department CERN Tier 1 USA France Tier 1 Tier 2 Italy CERN Tier 0 Taipei Lab b PMB, 12_07_2006, P. Kunszt Uni y Uni a Japan Germany Lab c CSCS UK Uni b grid for a physics study group

Swiss Partners CSCS – Swiss Supercomputing Centre SWITCH – Swiss Research & Edu Network

Swiss Partners CSCS – Swiss Supercomputing Centre SWITCH – Swiss Research & Edu Network CSCS § SA 1, NA 2, NA 3, NA 4 § Is an LCG Tier 2 Site § Support for Region and all of EGEE § Analysis of Physics Data § Biomed, Comp. Chemistry, EO applications § Training, Education, Public Relations SWITCH: § JRA 1 § Security Middleware: Next generation of Grid Certificates by integrating Shibboleth and PKI PMB, 12_07_2006, P. Kunszt

Challenges in EGEE: View of CSCS Nontrivial Administration § Steep learning curve to become

Challenges in EGEE: View of CSCS Nontrivial Administration § Steep learning curve to become ‘EGEE member’ § Reporting, Deliverables, etc Substantial Communication Overhead § Finding the right partner to communicate with § Many bodies, forums, sometimes contradictory information § It helps to be vocal – just trotting along silently will not help to improve the project Infrastructure: substantial effort § To keep the site running § To respond to updates § Many things are not well automated § Mean Time Between Failure very low (in Grid middleware) Complex System: Many things break in many ways PMB, 12_07_2006, P. Kunszt

Swiss Bio Grid PMB, 12_07_2006, P. Kunszt

Swiss Bio Grid PMB, 12_07_2006, P. Kunszt

Swiss Bio Grid Applications Usage Patterns of different Applications • Identified three classes of

Swiss Bio Grid Applications Usage Patterns of different Applications • Identified three classes of applications § Short CPU jobs (Docking) § Medium CPU + data exchange (Proteomics Pipelining) § Data intensive (Mass Spectrometry MS; Systems Biology) • Strategy: Address them in sequence, find commonalities § Dengue docking project (see next slides) § swiss. PIT (Protein Identification Toolbox) Project starting now PMB, 12_07_2006, P. Kunszt

Orphan Diseases: Dengue PMB, 12_07_2006, P. Kunszt

Orphan Diseases: Dengue PMB, 12_07_2006, P. Kunszt

What does it take to make a drug? 12 years of development, 802 mio

What does it take to make a drug? 12 years of development, 802 mio US$ (Di. Masi, J. A. et al. (2003) J Health Econ, 22, 151 -185). 1 in 10‘ 000 NCE becomes a product (Heilman, R. D. (1995) Qual Assur 4(1) 75 -9. ) ‚Only‘ 20 years of Patent – 8 years to make money Target ID Target validation BIOLOGY Screening Optimization CHEMISTRY PMB, 12_07_2006, P. Kunszt Preclinical Clinical DEVELOPMENT

“In Silico” Drug Development Bioinformatics, data mining, visualization, simulations, modeling, and many algorithms, databases

“In Silico” Drug Development Bioinformatics, data mining, visualization, simulations, modeling, and many algorithms, databases PMB, 12_07_2006, P. Kunszt

Screening of compounds Computational screening of small compounds to identify early drug candidates PMB,

Screening of compounds Computational screening of small compounds to identify early drug candidates PMB, 12_07_2006, P. Kunszt

Dengue Docking project Proof of concept for successful private-public partnership Biozentrum: in silico docking

Dengue Docking project Proof of concept for successful private-public partnership Biozentrum: in silico docking Novartis Institute for Tropical Deseases: In vitro/in vivo follow-up Novartis: drug development at cost PMB, 12_07_2006, P. Kunszt

Dengue Docking project 3 D structure of targets • NS 5 Methyltransferase • NS

Dengue Docking project 3 D structure of targets • NS 5 Methyltransferase • NS 3 Protease • GPE Envelope Glycoprotein • NS 3 Helicase • DOCK 5. 1 • Autodock 3. 05 • Flex. X (SCAI/Bio. Solv. IT) • GLIDE(Schrödinger) COMPOUND LIBRARIES TARGET PROTEINS ALGORITHMS IT char* filename = argv[1]; int seed; (argc > 2) ? seed = atoi(argv[2]) : seed = /* fill the array of random numbers */ double numbers[ITERATIONS]; //double foo = 0; for (int i = 0; i < ITERATIONS; i++) INFRASTRUCTURE numbers[i] = (double)ra //numbers[i] = foo++; / e array to the file */ ile = fopen ( filename, "w+" ); if (my. File == 0) { err << "could not ); } fwrite (numbe ITERATIONS fflush (my. Fil fclose (my. File) PMB, 12_07_2006, P. Kunszt • NCI Diversity (2 k) • NCI DTP (200 k) • ZINC (2700 k)

Dengue NS 5 Methyltransferase • PDB 1 R 6 A: Structure solved in complex

Dengue NS 5 Methyltransferase • PDB 1 R 6 A: Structure solved in complex with Ribavirin and Ado. HCys • 2' O-methylation of viral RNA (2 nd capping step of type 1 RNA cap) • Cofactor: SAM PMB, 12_07_2006, P. Kunszt • Deletion of SAM domain aborts viral replication in Kunjin (Koonin, 1993)

Current Achievements of GRID-enabled Dengue Docking Completed Phase I Swiss. Bio. Grid Completed large-scale

Current Achievements of GRID-enabled Dengue Docking Completed Phase I Swiss. Bio. Grid Completed large-scale parameterization test using Autodock 3. 0. 5: >500‘ 000 docking runs, >38‘ 000 h CPU time In vitro testing of predicted binders is underway at NITD Some initial candidates already in next phase PMB, 12_07_2006, P. Kunszt

Some challengs in grid adoption Compute resources are busy already § Agree on dedicated

Some challengs in grid adoption Compute resources are busy already § Agree on dedicated compute time for grid projects § PC Desktop grids: untapped resource § Buy new clusters for your grid (not the idea) Non-intrusiveness § Firewall exceptions § Non-intrusiveness on PC Desktop grids: application level Application clearing: § Security issues § Numerical stability in heterogeneous environments Data model in bioinformatics different from HEP § Applications need access to large databases or data sets PMB, 12_07_2006, P. Kunszt

Challenge: Heterogeneity Very different resources at participating institutes • Use ‘standard’ schedulers for clusters

Challenge: Heterogeneity Very different resources at participating institutes • Use ‘standard’ schedulers for clusters (Sun Grid Engine, LSF, PBS) • Agree on a higher-level Grid scheduler • Provide good documentation and bindings of the Grid scheduler to the predominant cluster schedulers • Work on new bindings Here we are already quite advanced, can make good use of results of other projects – but still a long way to go! PMB, 12_07_2006, P. Kunszt

Challenge: Numerical stability Before: After: PMB, Athlon CPU Itanium 2 CPU 12_07_2006, P. Kunszt

Challenge: Numerical stability Before: After: PMB, Athlon CPU Itanium 2 CPU 12_07_2006, P. Kunszt Bug fixed PDB: 3 dfr

Challenge: Security Sensitive data, data safety • Rely on standards for Authentication and Authorization

Challenge: Security Sensitive data, data safety • Rely on standards for Authentication and Authorization • Network data channel encryption • Encryption of distributed data on storage • Distributed keys and algorithms for retrieval (n of m schemes) Not at all addressed yet; a lot of room for improvement PMB, 12_07_2006, P. Kunszt

Challenge: Legacy Licensed, proprietary, legacy code • Solve the problem together with the software

Challenge: Legacy Licensed, proprietary, legacy code • Solve the problem together with the software provider • New licensing models for distributed computing (e. g. license servers don’t scale) • Legacy support § Recompilation if possible § Emulators § Virtual machines Virtual Machines may be the way forward for many of these applications – but not production quality yet, lot of research to be done; also a lot of room for improvement PMB, 12_07_2006, P. Kunszt

Challenge: User Interface Users don’t want to deal with Grid specifics • Set up

Challenge: User Interface Users don’t want to deal with Grid specifics • Set up a Grid Portal • Many portals exist, however almost none have a good application-specific interface for the users • Proteomics Project addresses this: dedicated proteomics pipelining portal based on existing Grid portal technologies – work started now together with the Swiss Institute of Bioinformatics and SZTAKI using P-GRADE • P-GRADE also addresses Legacy issues to some extent PMB, 12_07_2006, P. Kunszt

SEPAC stands for South European Partnership for Advanced Computing Uni. Zurich • SPACI consortium

SEPAC stands for South European Partnership for Advanced Computing Uni. Zurich • SPACI consortium - University of Lecce - University of Calabria - Hewlett-Packard • CILEA • CSCS • ETHZ • UNIZH PMB, 12_07_2006, P. Kunszt ETH Compu. Lab ETH CSCS CILEA Uni. Na Uni. Cal Uni. Le

SEPAC Project Scope • Infrastructure and Technology oriented collaboration • Exploration of technology and

SEPAC Project Scope • Infrastructure and Technology oriented collaboration • Exploration of technology and interoperability • Application portfolio being built • Building on another Grid Portal: the Grid Resource Broker from the Univ. of Lecce PMB, 12_07_2006, P. Kunszt

Intelligent Scheduling System ISS • Partners: CSCS, EPFL, EIA-FR • Provide a middleware service

Intelligent Scheduling System ISS • Partners: CSCS, EPFL, EIA-FR • Provide a middleware service allowing optimal placement and scheduling of applications on the Grid – submit to the most suited computer architecture based on resource and application monitoring • Research-oriented project, exploiting new ideas for a scheduling approach (2 Ph. Ds) PMB, 12_07_2006, P. Kunszt

ISS Details • Cost function includes monitoring data on machine status and application behaviour.

ISS Details • Cost function includes monitoring data on machine status and application behaviour. Usage of Γ model. See http: //pleiades 1. epfl. ch/~rgruber/projects/iss. pdf • Monitoring Data on machines and applications delivered by application monitoring and the service itself • Actual job submission through existing Grid middleware EPFL: SMP/NUMA High M cluster ETHZ: SMP/NUMA High M cluster Switch I S S CERN: EGEE EIF: No. W CSCS: SMP/vector Low M cluster PMB, 12_07_2006, P. Kunszt First Testbed : EPFL Mechanics departement machines (clusters & single CPU machines) Second Testbed : Whole EPFL Third Testbed : EPFL + CSCS + EIA-Fr + ETHZ machines

Exemple : Integration of ISS into VIOLA/MSS/Uni. CORE Environment Team : CSCS, EPFL, EIA-Fr,

Exemple : Integration of ISS into VIOLA/MSS/Uni. CORE Environment Team : CSCS, EPFL, EIA-Fr, Fh. G, JFZ PMB, 12_07_2006, P. Kunszt

And There Are More… … Swiss Involvements in Grids • Core. Grid: EPFL, CSCS,

And There Are More… … Swiss Involvements in Grids • Core. Grid: EPFL, CSCS, EIA-FR • Know. ARC project: University of Geneva • DILIGENT: University of Basel • EMBRACE: University of Lausanne, SIB • Computational Chemistry Grid: University of Zurich • … PMB, 12_07_2006, P. Kunszt

Content Swiss Grid Initiative Swiss Involvements in Grid Projects – challenges § EGEE §

Content Swiss Grid Initiative Swiss Involvements in Grid Projects – challenges § EGEE § Swiss Bio Grid § SEPAC § Intelligent Scheduling System ISS Importance of Grids in General § Beyond the Hype § Strategies for Successful Grids Importance of National Grids § Why is it necessary to have a national Grid § Advantages and Disadvantages of participating in large projects PMB, 12_07_2006, P. Kunszt

Are Grids Just a Hype? Grids respond to a Paradigm Shift in Scientific Discovery

Are Grids Just a Hype? Grids respond to a Paradigm Shift in Scientific Discovery § Paradigm Shift from Individual Researchers to Collaborations § Project driven research – joint work preferred over one-man shows § Collaborations do achieve the most relevant results these days § Need for Collaborative Computing Platforms § Need for temporary Virtual Organizations to do work, share data and results and publish results Grids are here to stay PMB, 12_07_2006, P. Kunszt

All Grids? Successful Grids are measured by the success of their users Ease of

All Grids? Successful Grids are measured by the success of their users Ease of use Ease of configuration Non-intrusiveness at participating sites Security Robustness Some Grids will Disappear PMB, 12_07_2006, P. Kunszt

Measure of Success 1. Users are producing scientific results • Harnessing increased computing capacity

Measure of Success 1. Users are producing scientific results • Harnessing increased computing capacity • Easy integration of applications – users can focus on their field instead of computing • Number of Publications • Complexity of applications We are not here yet 2. New Projects WANT to use your Grid instead of building their own • If people knock on your door that they want to work with you, you know you are successful 3. Your Repository of Middleware is used by others • You need robust, professionally documented, re-usable software • Using Grid Service standards, interoperable • Mandatory collaboration with other Grid projects and Universities PMB, 12_07_2006, P. Kunszt

Content Swiss Grid Initiative Swiss Involvements in Grid Projects – challenges § EGEE §

Content Swiss Grid Initiative Swiss Involvements in Grid Projects – challenges § EGEE § Swiss Bio Grid § SEPAC § Intelligent Scheduling System ISS Importance of Grids in General § Beyond the Hype § Strategies for Successful Grids Importance of National Grids § Why is it necessary to have a national Grid § Advantages and Disadvantages of participating in large projects PMB, 12_07_2006, P. Kunszt

National Grids Scaling large multinational projects can only be done through a well-managed hierarchy

National Grids Scaling large multinational projects can only be done through a well-managed hierarchy Strategy of long-term infrastructures will follow the NREN model EU drives in this direction: building on national infrastructures Visible results of National Financing build the basis for EU funding PMB, 12_07_2006, P. Kunszt

Participating in Large Multinational Projects: ADVANTAGES • Being part of the game, enabling national

Participating in Large Multinational Projects: ADVANTAGES • Being part of the game, enabling national users to play on the large international playground • Access to a much larger infrastructure • Ability to voice local interests to the large community • Ability to focus on strengths, taking components from others • Building expertise in large Grids • Profiting from international funding • Visibility of the national efforts on an international scale, raising the attractivity of the country PMB, 12_07_2006, P. Kunszt

Participating in Large Multinational Projects: DISADVANTAGES • In large multinational projects the large nations

Participating in Large Multinational Projects: DISADVANTAGES • In large multinational projects the large nations will dominate • Many technological decisions are political and not baswed on quality § Choice of middleware components § Assigning development tasks to concurring teams • Inefficiency of large projects § Communication Overhead § Meetings, conferences, telephones, emails. . . § Internal arguments • Need for Compromise – Slow Decision making • Positioning inside a project very important § Expertise § Choice of partners inside the project PMB, 12_07_2006, P. Kunszt

Links Swiss. Grid Initiative: http: //www. swiss-grid. org/ or http: //www. gridinitiative. ch/ CSCS:

Links Swiss. Grid Initiative: http: //www. swiss-grid. org/ or http: //www. gridinitiative. ch/ CSCS: http: //cscs. ch/ PMB, 12_07_2006, P. Kunszt