Condor and DRBL Bruno Gonalves Stefan Boettcher Emory

  • Slides: 15
Download presentation
Condor and DRBL � Bruno Gonçalves & Stefan Boettcher Emory University

Condor and DRBL � Bruno Gonçalves & Stefan Boettcher Emory University

Motivation Maximize computing power while minimizing costs Optimize the use of the resources that

Motivation Maximize computing power while minimizing costs Optimize the use of the resources that are already available Maximize resource availability Permit peaceful coexistence with previously existing Operating Systems Condor Week 2006

Software Fedora Core Linux http: //fedora. redhat. com/ Other distributions can be used as

Software Fedora Core Linux http: //fedora. redhat. com/ Other distributions can be used as well Diskless Remote Book on Linux (DRBL) http: //drbl. sourceforge. net Condor clustering softweare http: //www. cs. wisc. edu/condor/ Condor Week 2006

Hardware Server (complete machine) Large HDD Several network cards Client (stripped down machine) CPU

Hardware Server (complete machine) Large HDD Several network cards Client (stripped down machine) CPU RAM Network Card Condor Week 2006

DRBL Uses PXE or Etherboot to let clients boot through the network All files

DRBL Uses PXE or Etherboot to let clients boot through the network All files can be located at the server and accessed via NFS (clients don’t need harddrives!) Server only provides file sharing and user authentication, all software uses the clients own resources to run Condor Week 2006

DRBL Installation (I) # drblsrv -i Updates the system (similarly to “up 2 date”,

DRBL Installation (I) # drblsrv -i Updates the system (similarly to “up 2 date”, etc…) Makes sure relevant services (dhcpd, NFS, NIS, tftpboot, etc. . ) are installed Configures necessary services Selects the kernel to be used by clients Condor Week 2006

DRBL Installation II �drblpush -i # Which network interfaces to use Client booting options

DRBL Installation II �drblpush -i # Which network interfaces to use Client booting options (text/gui) How many clients and hostnames MAC address to IP/hostname binding (if any) “Pushes” all the configurations to the clients (creating new clients if necessary) Needs to be run anytime we want to change the structure of the cluster Condor Week 2006

Structure Internet DRBL server/Firewall Central Manager 192. 168. 110. x 192. 168. 120. x

Structure Internet DRBL server/Firewall Central Manager 192. 168. 110. x 192. 168. 120. x Compute nodes Condor Week 2006

Condor Installation #. /condor_install All machines share the same password files All filesystems are

Condor Installation #. /condor_install All machines share the same password files All filesystems are NFS mounted and shared between all the machines Configure condor for all DRBL clients even nonexistent ones. Condor Week 2006

Dedicated Cluster Number of configured clients can be larger than number of machines (easily

Dedicated Cluster Number of configured clients can be larger than number of machines (easily add more machines) Clients boot to text mode Condor configured for dedicated resources Condor Week 2006

Windows Computer Lab Number of nodes should correspond to number of machines MAC address

Windows Computer Lab Number of nodes should correspond to number of machines MAC address binding can be used for extra security Nodes can PXEBoot when they’re available for computation (evening / holidays / vacations) and go back to windows when strictly necessary (morning) Condor’s checkpointing (and flocking) utilities allow for jobs to be ran in whichever resources are available at a given time Condor Week 2006

Centralized Cluster management drbl-doit Run command on all clients drbl-cp-host, cp/rm drbl-rm-host file or

Centralized Cluster management drbl-doit Run command on all clients drbl-cp-host, cp/rm drbl-rm-host file or directory to all clients drbl-useradd, add/del drbl-userdel user accounts drbl-client-service Control start) services on clients (drbl-client-service condor Condor Week 2006

Advantages Flexible Easily add and remove machines (plug and play) Usable for both dedicated

Advantages Flexible Easily add and remove machines (plug and play) Usable for both dedicated and opportunistic clustering Stable Running for months without problems even with nodes being added, removed and upgraded Both clients and server can be rebooted without (too much) harm Efficient “Biggest bang for your buck” Condor Week 2006

Disadvantages Not ideal for IO intensive applications (NFS overhead) Communication between nodes on different

Disadvantages Not ideal for IO intensive applications (NFS overhead) Communication between nodes on different subnets are routed through server All communication with outside world has to go through server Condor Week 2006

The End Questions? Suggestions?

The End Questions? Suggestions?