Designing a PC Farm to Simultaneously Process Separate

  • Slides: 13
Download presentation
Designing a PC Farm to Simultaneously Process Separate Computations Through Different Network Topologies Patrick

Designing a PC Farm to Simultaneously Process Separate Computations Through Different Network Topologies Patrick Dreher MIT

A Multi-Purpose PC Farm l l l Goals of the Project Functionality and Constraints

A Multi-Purpose PC Farm l l l Goals of the Project Functionality and Constraints Hardware Selection Software Selection Operation PATRICK DREHER International Conference On Computing In High Energy And Nuclear Physics. February 10, 2000 Session E #255 02/00 ID 074 -2

Goals of the Project l User requirements – Production machine for the experimentalists for

Goals of the Project l User requirements – Production machine for the experimentalists for Monte Carlo simulations and physics analysis of experimental data – Development and testing platform for theorists to examine the performance characteristics of the x 86 chip design l Design a way for both experimentalists and theorists to peacefully co-exist sharing the existing PC farm hardware at the same time PATRICK DREHER International Conference On Computing In High Energy And Nuclear Physics. February 10, 2000 Session E #255 02/00 ID 074 -3

Existing PC Farm Hardware l The configuration for each machine in the existing PC

Existing PC Farm Hardware l The configuration for each machine in the existing PC farm – – 20 dual Pentium II 400 MHz CPUs 384 Mbytes memory 13 Gbytes disk space fast Ethernet PCs interconnected by Kingston Ether. Rx 100 Base. Tx fast Ethernet stackable hubs l A front end PC connecting the farm nodes to the internet l PATRICK DREHER International Conference On Computing In High Energy And Nuclear Physics. February 10, 2000 Session E #255 02/00 ID 074 -4

PC Farm Software Operating system is Red. Hat Linux x 86 version 5. 2

PC Farm Software Operating system is Red. Hat Linux x 86 version 5. 2 l Linux kernel configured for SMP operations l Production of batch jobs managed through Network Queuing System (http: //www. gnqs. org) l PATRICK DREHER International Conference On Computing In High Energy And Nuclear Physics. February 10, 2000 Session E #255 02/00 ID 074 -5

LAN PATRICK DREHER International Conference On Computing In High Energy And Nuclear Physics. February

LAN PATRICK DREHER International Conference On Computing In High Energy And Nuclear Physics. February 10, 2000 Session E #255 HUB 02/00 ID 074 -6

Constraints for the Project No new funds were available to purchase additional CPUs for

Constraints for the Project No new funds were available to purchase additional CPUs for the existing PC farm l No new funds were available to purchase a separate PC farm for development and testing of theory codes l No funds were available at the level needed to purchase high performance network switches (such as Myrinet) l Small amounts of funds were available for additional peripherals l PATRICK DREHER International Conference On Computing In High Energy And Nuclear Physics. February 10, 2000 Session E #255 02/00 ID 074 -7

Modified PC Farm - Functionality Original configuration had 20 machines (40 CPUs) available under

Modified PC Farm - Functionality Original configuration had 20 machines (40 CPUs) available under a batch queuing system (NQS) l Modified configuration set aside 4 of the machines for theorists (8 CPUs) leaving the other 32 CPUs for production work and analysis of experimental data l Four 4 -port Adaptec network cards were purchased and one was installed in each of the four machines l The four machines were networked together in a twodimensional torus l PATRICK DREHER International Conference On Computing In High Energy And Nuclear Physics. February 10, 2000 Session E #255 02/00 ID 074 -8

LAN PATRICK DREHER International Conference On Computing In High Energy And Nuclear Physics. February

LAN PATRICK DREHER International Conference On Computing In High Energy And Nuclear Physics. February 10, 2000 Session E #255 HUB 02/00 ID 074 -9

Modes of Operation Production operation for the experimentalists involved configuring NQS so that it

Modes of Operation Production operation for the experimentalists involved configuring NQS so that it identified 40 CPUs available for production and analysis of data l An alternate NQS configuration was built that identified only 32 CPUs available for production l Only one of these two configurations could be installed and operational on the PC farm at a given time l PATRICK DREHER International Conference On Computing In High Energy And Nuclear Physics. February 10, 2000 Session E #255 02/00 ID 074 -10

Modes of Operation (cont’d) l When the alternate NQS configuration was loaded – Experimentalists

Modes of Operation (cont’d) l When the alternate NQS configuration was loaded – Experimentalists would continue to use the 32 CPUs – The theorists would first log onto the front end and then use ssh to log onto one of the 4 machines not grouped under the alternate NQS configuration l From this point, theory codes could be started using one, two, or all four machines PATRICK DREHER International Conference On Computing In High Energy And Nuclear Physics. February 10, 2000 Session E #255 02/00 ID 074 -11

Results l Theorists – Gathered data as part of a larger program to compare

Results l Theorists – Gathered data as part of a larger program to compare the performance between x 86, alpha 164 and 264 chips • Interprocessor communication using MPI • Tests on memory bandwidth • Tests on lattice size of versus L 2 cache for certain computation routines l Experimentalists – continued Monte Carlo production and analysis of experimental data PATRICK DREHER International Conference On Computing In High Energy And Nuclear Physics. February 10, 2000 Session E #255 02/00 ID 074 -12

Last Slide PATRICK DREHER International Conference On Computing In High Energy And Nuclear Physics.

Last Slide PATRICK DREHER International Conference On Computing In High Energy And Nuclear Physics. February 10, 2000 Session E #255 02/00 ID 074 -13