Using the Parallel Universe beyond MPI Becky Gietzel
Using the Parallel Universe beyond MPI Becky Gietzel Computer Sciences Department University of Wisconsin-Madison bgietzel@cs. wisc. edu
Parallel Universe applications using Metronome h. Metronome’s support for running parallel jobs builds on Condor’s Parallel Universe h. Possible to run coordinated Metronome jobs on multiple machines at the same time with available communication between them h. Provides advanced testing opportunities h. Some examples: client/server, crossplatform, compatibility, stress/scalability www. cs. wisc. edu/~bgietzel
Service testing challenges h. Starting multiple services on the same machine does not allow for testing across a network or different platforms h. Deciding when to start the services and when to start tests requires human intervention h. Setup of the services is usually a manual process, or don’t bother testing. h. Same goes for the teardown of services to return the machines to their original state www. cs. wisc. edu/~bgietzel
Benefits of using Metronome h Condor manages dynamic claiming of resources, communication between job nodes and cleaning up after the jobs run h Metronome publishes basic information about each task to the job ad where it’s accessible by any node, acting as a “scratch space” for the job h The hostnames of all job nodes, the start time, return code, and end time for each task on each node are published to this shared job ad h This information is useful for communication between nodes and synchronization in the user’s glue scripts. www. cs. wisc. edu/~bgietzel
Client/server test example SERVER Parallel Job Execute Node 0 Start server Send port to client Handle client requests Poll for ALLDONE from client Exit Submit Node Discover server hostname and port Start client Run queries against server Execute Node 1 CLIENT Send ALLDONE message to server Exit www. cs. wisc. edu/~bgietzel
How to submit a parallel job in Metronome h. Several minor modifications to the Metronome submit file are necessary for parallel jobs h. List of platforms is comma separated with parentheses around the outside h. Platforms = (x 86_rhas_3, x 86_rhas_4) www. cs. wisc. edu/~bgietzel
Parallel job submit files continued h. Add a glue script for each task/node › › › › combination to be executed remotely. platform_pre_0 = client/platform_pre_1 = server/platform_pre remote_declare_0 = client/remote_declare_1 = server/remote_declare remote_task_0 = client/remote_task_1 = server/remote_task_args_0 = 9000 remote_task_args_1 = 9001 h… and so forth for all glue scripts. www. cs. wisc. edu/~bgietzel
Other parallel job use cases h. Cross platform testing (Linux to Solaris) h. Scalability/stress testing (1 server, many clients) h. Compatibility testing (cross version, stable vs. development series) www. cs. wisc. edu/~bgietzel
For more information h. Documentation is available on the NMI site h. See http: //nmi. cs. wisc. edu/node/1001 for information on running parallel jobs using Metronome hhttp: //nmi. cs. wisc. edu/node/282 describes how to set up your own Metronome installation for running parallel jobs www. cs. wisc. edu/~bgietzel
- Slides: 9