The High Performance Cluster for QCD Calculations System

  • Slides: 19
Download presentation
The High Performance Cluster for QCD Calculations: System Monitoring and Benchmarking Lucas Fernandez Seivane

The High Performance Cluster for QCD Calculations: System Monitoring and Benchmarking Lucas Fernandez Seivane quevedin@mail. desy. de Summer Student 2002 IT Group, DESY Hamburg Supervisor: Andreas Gellrich Oviedo University (Spain)

Topics ® Some Ideas of QM ® The QFT Problem ® Lattice Field Theory

Topics ® Some Ideas of QM ® The QFT Problem ® Lattice Field Theory ® What can we get? ® Approaches to the computing ® lattice. desy. de: ® Hardware ® Software ® The stuff we made: Clumon ® Possible improvements

Let’s do some physics… ® QM, “real behavior” of the world: ‘fuzzy world’ ®

Let’s do some physics… ® QM, “real behavior” of the world: ‘fuzzy world’ ® Relativity means causality (cause must precede consequence!) ® Any complete description of Nature must combine both ideas ® The only consistent way of doing this is … QUANTUM FIELD THEORY

The QFT Problem ® Impossible to solve it exactly ® PERTURBATIVE APPROACH ® Necessity

The QFT Problem ® Impossible to solve it exactly ® PERTURBATIVE APPROACH ® Necessity of small coupling constant (like em = 1/137) ® Example: QED (the strange theory of light and matter) Taylor: em+ 2 em/2 + 3 em/6 +…

… but for QCD ® Not small coupling constant (at least at low energies)

… but for QCD ® Not small coupling constant (at least at low energies) ® We cannot explain (at least analytically) a proton!!! ® We do need something exact (the LATTICE is EXACT*)

Lattice field theory ® Generic tool for approaching non perturbative QFT ® But more

Lattice field theory ® Generic tool for approaching non perturbative QFT ® But more necessary in QCD (non perturbative aspects) ® Even pure theoretical interests (Wilson approach)

What can we get? ® We are interested in the spectra (bound states, masses

What can we get? ® We are interested in the spectra (bound states, masses of particles) ® We can do it by means of correlation functions: if we could calculate them exactly, we would have solved theory ® They are extracted out of Path Integrals (foil 1) ® The problem is calculate Path Integrals Lattice can calculate Path Integrals

A Naïve Approach ® Discretize space-time ® Monte-Carlo methods for choosing field configurations (Random

A Naïve Approach ® Discretize space-time ® Monte-Carlo methods for choosing field configurations (Random generators) ® Numerical evaluation of Path Integrals and correlation functions!!! (typical lattice sizes: a=0. 05 -0. 1 fm, 1/a = 2 Ge. V, L=32) but…

…but ®Huge computer power Highly dimensional integrals ii. The calculation requires to compute the

…but ®Huge computer power Highly dimensional integrals ii. The calculation requires to compute the inverse of an “infinite”-dimensional matrix, which takes a lot of CPU time and RAM. i. ®That’s why we need clusters, supercomputers or special machines (to divide the work) ®The amount of data transferred is not so important, the deciding factor is the LATENCY of the network and the scalability above 1 TFlops

How can we get it? ® General Purpose Supercomputers: ® Very expensive ® Rigid

How can we get it? ® General Purpose Supercomputers: ® Very expensive ® Rigid (difficult upgrades on hardware) ® Fully customed parallel machines: ® Completely optimized ® Only this use (difficult recycling) ® Necessity of design, develop and build (or modify) the hard & soft ® Commodity ® “Cheap clusters PC” components ® Completely customizable ® Easy to upgrade / recycle

Machines ®Commercial Supercomputers: Cray. T 3 E, Fujitsu VPP 77, NECSx 4, Hitachi SR

Machines ®Commercial Supercomputers: Cray. T 3 E, Fujitsu VPP 77, NECSx 4, Hitachi SR 8000… ®Parallel machines: APEmille/ape. NEXT INFN/DESY QCDSP/QCDOC CU/UKQCD/Riken CP-PACS Tsukuba/Hitachi ®Commodity clusters + Fast Networking ® Low latency (Fast Networking) ® Fast Speed ® Standard software and programming environments

Lattice cluster@DESY ®Cluster bought from a company (Megware), Beowulf type (1 master, 32 slaves)

Lattice cluster@DESY ®Cluster bought from a company (Megware), Beowulf type (1 master, 32 slaves) ®Before upgrade (some weeks ago): 32 nodes: Intel. XEONP 4 1. 7 GHz 256 KB cache 1 GB Rambus RAM 2 64 bit PCI slots 18 GB SCSI hard disks Fast Ethernet switch (normal networking, NFS disk mounting) Myrinet network (low latency) ®Upgrade (August 2002) 16 nodes: 2 Intel. XEONP 4 1. 7 GHz 256 KB cache 16 nodes: 2 Intel. XEONP 4 2. 0 GHz 512 KB cache

Lattice cluster@DESY(2) ® Software: Su. SE Linux (modified by Megware) ® MPICH-GM (implementation of

Lattice cluster@DESY(2) ® Software: Su. SE Linux (modified by Megware) ® MPICH-GM (implementation of MPICHamaleon for Myrinet GM system) ® Megware Clustware (Open. SCE/SCMS modified): tool for monitoring and administration (but no logs)

Lattice cluster@DESY(3) http: //lattice. desy. de/cgi-bin/clumon/cgi_clumon. pl ® Andreas Gellrich First Version: ® Provides

Lattice cluster@DESY(3) http: //lattice. desy. de/cgi-bin/clumon/cgi_clumon. pl ® Andreas Gellrich First Version: ® Provides logs and monitoring ® Perl written (customizable)

Lattice cluster@DESY(4) http: //lattice. desy. de/cgi-bin/clumon/cgi_clumon. pl ® Me and Andreas Gellrich new version:

Lattice cluster@DESY(4) http: //lattice. desy. de/cgi-bin/clumon/cgi_clumon. pl ® Me and Andreas Gellrich new version: ® Also graphical data and another log measure ® Uses MRTG to graph data

Clumon v 2. 0 (1)

Clumon v 2. 0 (1)

Clumon v 2. 0 (2)

Clumon v 2. 0 (2)

Work done (in progress) ® Getting the flavor of a really high-perf cluster ®

Work done (in progress) ® Getting the flavor of a really high-perf cluster ® Learning Perl (more or less) to understand Andreas tool ® Playing around with Andreas tool ® Search for how to graph this kind of data ® Learning how to use MRTG/RRDtool ® Some test and previous versions ® Only have to do last retouches (polishing): ® Time info of the cluster ® Better documentation of the tools ® Play around this last week with other stuff ® Prepare up talk and document and write

Possible Improvements ® The cluster is unplugged to AFS DESY ® Need for Backups

Possible Improvements ® The cluster is unplugged to AFS DESY ® Need for Backups / Archiving of the Data stored (d. Cash theoc 01) ® Maybe reinstall the cluster with DESY Linux (to fully know what’s in it) ® Play around with other cluster stuff: Open. SCE, OSCAR, ROCKS…