May 2005 Argonne IL SSS Facetoface meeting SSS

  • Slides: 13
Download presentation
May 2005, Argonne, IL SSS Face-to-face meeting SSS Deployment using OSCAR John Mugler, Thomas

May 2005, Argonne, IL SSS Face-to-face meeting SSS Deployment using OSCAR John Mugler, Thomas Naughton & Stephen Scott Oak Ridge National Laboratory -- U. S. Department of Energy

OSCAR: Cluster Toolkit • Framework for cluster management – simplifies installation, configuration and operation

OSCAR: Cluster Toolkit • Framework for cluster management – simplifies installation, configuration and operation – reduces time/learning curve for cluster build • requires: pre-installed headnode w. supported Linux distribution • thereafter: wizard guides user thru setup/install of entire cluster • Package-based framework – Content: Software + Configuration, Tests, Docs – Types: • Core: SIS, C 3, Switcher, ODA, OPD, (Support Libs) • Non-core: selected & third-party – Access: repositories accessible via OPD/OPDer Oak Ridge National Laboratory -- U. S. Department of Energy

OSCAR Wizard * OSCAR-3. 0 release Oak Ridge National Laboratory -- U. S. Department

OSCAR Wizard * OSCAR-3. 0 release Oak Ridge National Laboratory -- U. S. Department of Energy

Using OSCAR for SSS Problem: Helping users obtain and install SSS software. Solution: Leverage

Using OSCAR for SSS Problem: Helping users obtain and install SSS software. Solution: Leverage OSCAR framework to package and distribute the SSS suite, sss-oscar A release of OSCAR containing all SSS software in single downloadable bundle. Oak Ridge National Laboratory -- U. S. Department of Energy

OSCAR-ized SSS Components • Bamboo – Queue/Job Manager • BLCR – Berkeley Checkpoint/Restart •

OSCAR-ized SSS Components • Bamboo – Queue/Job Manager • BLCR – Berkeley Checkpoint/Restart • Gold – Accounting & Allocation Management System • LAM/MPI (w/ BLCR) – Checkpoint/Restart enabled MPI • MAUI-SSS – Job Scheduler • SSSLib – SSS Communication library – Includes: SD, EM, PM, BCM, NSM, NWI • Warehouse – Distributed System Monitor • MPD 2 – MPI Process Manager Oak Ridge National Laboratory -- U. S. Department of Energy * As of May 2005

Current Status • Released v 1. 0 at SC’ 04 – Based on oscar-3.

Current Status • Released v 1. 0 at SC’ 04 – Based on oscar-3. 0 (using Red Hat 9/x 86) – All SSS components represented • Testing for v 1. 1 release – Small update release – Still oscar-3. 0 based • Synchronize with OSCAR release schedule – oscar-4. 1 released – Shift to oscar-4. 1 in sss-oscar-1. 2 release (2 Q 2005) Oak Ridge National Laboratory -- U. S. Department of Energy

OSCAR v 4. 1 Highlights • SSS’s APItest tool integrated into v 4. 1

OSCAR v 4. 1 Highlights • SSS’s APItest tool integrated into v 4. 1 release • Improved use of Dep. Man/Pack. Man abs. layer • Distributions supported in v 4. 1 – x 86: RH 9, FC 2, MDK 10. 0 – x 86 & ia 64: RH EL 3 • Initial work started for Debian – Not in v 4. 1 release but working with 4. x devel tree Oak Ridge National Laboratory -- U. S. Department of Energy

TODO: SSS • Short term – Complete testing for v 1. 1 beta &

TODO: SSS • Short term – Complete testing for v 1. 1 beta & release – Update SSS documentation • Medium term – Migrate to new FRE testbed and repository (pending approval) – New/more Linux distribution/architecture/kernel support • Longer term – Extend SSS component tests 1) Installation, 2) Validation, 3) Durability/Stress, 4) Performance – Track oscar-4. x releases for v 5. 0 compatibility – Distribute as OSCAR “Package Set” • Pending feature support in OSCAR – OPKG ordering within a phase • Pending feature support in OSCAR Oak Ridge National Laboratory -- U. S. Department of Energy

SSS-OSCAR Release Schedule SSS Versio n Freeze Date v 1. 1 Feb 15 May

SSS-OSCAR Release Schedule SSS Versio n Freeze Date v 1. 1 Feb 15 May oscar-3. 0 v 1. 2 Jun 15 July oscar-4. 1 v 1. 3 Aug 15 Sept oscar-4. x Based on Release Timeframe OSCAR v 1. 4/2. Octto 15 Tracker @Nov - SC’ 05 oscar-5. 0/ Add features http: //sf. net/projects/sss-oscar 0 Oak Ridge National Laboratory -- U. S. Department of Energy

Roadmap • 1. 2 (frz: jun, rel: jul) – Fedor Core 2 / Pkg

Roadmap • 1. 2 (frz: jun, rel: jul) – Fedor Core 2 / Pkg rebuild • – – Improved install/validation tests oscar-4. 1 opkg modifications (updates) • • – • Close (most) open tracker issues LRS change over Fedora Core 4 / Pkg rebuild Improved install/validation tests Add performance/stress tests? oscar-4. x opkg modifications (updates) • – Updates to HOWTO as needed Meta-scheduler (Silver)? 2. 0. 1 (frz: oct, rel: nov) [SC’ 05] – • Updates to HOWTO as needed Simplify XML meta file 2. 0 (frz: aug, rel: sep) – – – • BLCR upgrade to linux-2. 6 Any bugfixes/minor updates 2. 02 – SSS oscar-pkg set Oak Ridge National Laboratory -- U. S. Department of Energy

Goals for sss-oscar-2. 0 • • Release v 2. 0 at SC’ 05 Compatible

Goals for sss-oscar-2. 0 • • Release v 2. 0 at SC’ 05 Compatible with oscar-5. 0 Support current Linux distribution(s) Improve interoperability with standard OSCAR – Users obtain via “SSS OSCAR Pkg Repository” – Likely leverage “Package Sets” for logical grouping – Clarify SSS package dependencies • What about outside of SSS-OSCAR? • Improved testing – Supply thorough installation/validation/performance tests • Documentation – Specifications for component interfaces (schemas), etc. Oak Ridge National Laboratory -- U. S. Department of Energy

Comments/Discussion • Provide a lower cost of entry – Doc to help knit system

Comments/Discussion • Provide a lower cost of entry – Doc to help knit system together • Clarify dependencies/interactions – Intra-component and inter-component • Feedback to help Ron O. for testing/validation – Tests to verify against component specs. – Ex. The PM specs state X capability & it work in this build – Effectively conformance tests to “optional” SSS specs. • What do we need to help coming releases? – Louder drum for Thomas? – Dedicated integration periods (face-to-face and/or virtual)? Oak Ridge National Laboratory -- U. S. Department of Energy

Resources • ORNL test clusters – Systems: sss-xtorc, test 1, test 2 – Access

Resources • ORNL test clusters – Systems: sss-xtorc, test 1, test 2 – Access via ORNL SSH Login Server – Must do reservations/coordinate use (Note, no remote power mgmt) • Investigating ORNL “FRE” (enclaves) – Add “test. X” system to alleviate ORNL SSH Login Server • SSS-OSCAR Project page – Hosted at http: //sourceforge. net/projects/sss-oscar/ • OSCAR Homepage – http: //www. Open. Cluster. Group. org/OSCAR/ – Includes “HOWTO: Create an OSCAR Package” document Oak Ridge National Laboratory -- U. S. Department of Energy