Parallel Computing using Condor on Windows PCs Peng

  • Slides: 12
Download presentation
Parallel Computing using Condor on Windows PCs Peng Wang and Corey Shields Research and

Parallel Computing using Condor on Windows PCs Peng Wang and Corey Shields Research and Academic Computing Division University Information Technology Services Indiana University

Problem Description Turn Windows desktop systems in STC labs (around 2000) into a parallel

Problem Description Turn Windows desktop systems in STC labs (around 2000) into a parallel scientific computer

Discussion Parallel applications need coordination through message passing MPI does not handle ephemeral processes

Discussion Parallel applications need coordination through message passing MPI does not handle ephemeral processes well Multiplexing communication among processes Ports brokered among multiple parallel sessions

What do we have ? Condor NT, vanilla universe match-making, file transfer, fair sharing,

What do we have ? Condor NT, vanilla universe match-making, file transfer, fair sharing, job submission, suspension, preemption, restart, security Test application – fast. DNAml-p Parallel application, master-worker model, small granularity of work

How we did it Simple Message Brokering Library (SMBL) Process and Port Manager (PPM)

How we did it Simple Message Brokering Library (SMBL) Process and Port Manager (PPM) A mechanism for users to submit jobs (web portal)

SMBL An IO multiplexing server in charge of message delivery for each parallel session

SMBL An IO multiplexing server in charge of message delivery for each parallel session (serialize communication) SMBL client library implements selected MPI-like calls Both the server and the client library are based on a TCP socket abstraction library

Process and Port Manager Assigns port to each of the SMBL server process start

Process and Port Manager Assigns port to each of the SMBL server process start the SMBL server and application processes on demand direct workers to their servers

The Portal Apache based PHP web interface Creates and submits the condor submit files

The Portal Apache based PHP web interface Creates and submits the condor submit files

The Big Picture The shaded box indicates components hosted on multiple desktop computers

The Big Picture The shaded box indicates components hosted on multiple desktop computers

Statistics Red: total owner Blue: total idle Green: total Condor

Statistics Red: total owner Blue: total idle Green: total Condor

Scalability Issues Needed big server Adjusted condor_config MAX_JOBS_RUNNING 1000 SHADOW_SIZE_ESTIMATE 900 KB MAX_STARTD_LOG 640

Scalability Issues Needed big server Adjusted condor_config MAX_JOBS_RUNNING 1000 SHADOW_SIZE_ESTIMATE 900 KB MAX_STARTD_LOG 640 KB Lost workers because of per-process file descriptor limit (1024)

Summary Built a large parallel scientific computing facility using Condor Built parallel message passing

Summary Built a large parallel scientific computing facility using Condor Built parallel message passing library to deal with ephemeral resources Built port broker to handle multiple parallel sessions Built web portal It is open source, visit: http: //smbl. sourceforge. net