CERN European Laboratory for Particle Physics PC Farms
CERN - European Laboratory for Particle Physics PC Farms at CERN Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998
CERN - European Laboratory for Particle Physics Disclaimer n This will cover farms which imply an involvement of CERN’s computer center. n There are other farms in strict online environments or “private” farms in building. Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 2
CERN - European Laboratory for Particle Physics Overview n Off line farms • Linux farms • NT farms • Issues n n PC Technology & Performance Online Farms & quasi online farms Cost of ownership Conclusions Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 3
CERN - European Laboratory for Particle Physics Linux Farms - Nomad n n n Proof of concept in Summer 97 Straight NQS port SHIFT SW client port CERNLIB port NOMAD observed a quasi linearity with clock frequency compared to Alpha’s !!! • I. e. Alpha@266 MHz = PII@266 MHz n Now 17 PC’s dual, 3 types of MB Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 4
CERN - European Laboratory for Particle Physics Linux Farms - NA 49 n n n NA 49 already deployed privately a PC farm in their premises Request a new farm to be deployed in order to benefit from the computer center infrastructure (people and equipment …) in 1 H 98 Trivial deployment, running with NQS Most PC’s are branded PC’s (HP) Now completely off RISC for CPU 18 DUALS @ 300 ->400 MHz Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 5
CERN - European Laboratory for Particle Physics NA 49 Analysis - data access Unix Server CORE Server Tape Server HP HP HP K 260 K 260 Servers Hi. PPI 10 PC 0 B PC PC T PC PC PC FDDI SGI Challenge 600 GB 1 Run SONY DMS Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 From experiment 10 -12 TB / month 1 month/year Manual Feed 100 GB Cartridges 6
CERN - European Laboratory for Particle Physics Linux Farms (NA 48) n n n NA 48 was using the QSW CS/2 (128 proc. ) CS/2 overload -> investigate PC’s in late 97 Installation of 12 Dual machines in 1 Q 98 and more. . . Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 7
CERN - European Laboratory for Particle Physics Linux Issues n n n EEPRO 100 B MP crashes AFS support (MP) NFS support (MP) Commercial software Manufacturer support for Linux Very few Linux experts Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 8
CERN - European Laboratory for Particle Physics NT offline Farms n PCSF • Simulation facility but … n COMPASS • Evaluating & benchmarking technology Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 9
CERN - European Laboratory for Particle Physics PCSF - Overview n n n Configuration Applications Data access Specific work & solutions Key issues Conclusions Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 10
CERN - European Laboratory for Particle Physics PCSF - Goals n n Make PC+NT a standard option for Physics Data Processing, starting with simulation Establish a minimum management model for NT farm management Address scalability issues Gain Windows NT experience Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 11
CERN - European Laboratory for Particle Physics PCSF Milestones n n n Joined RD 47 in Autumn 96 Price inquiry issued in 12/96 Hardware delivered 4/97 Ready to use 6/97 RD 47 report 10/97 Expansion 5/98 Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 12
CERN - European Laboratory for Particle Physics PCSF Configuration (1) n Server running NT 4. 0 Server SP 3 • 1 dual capable Ppro @ 200 MHz, 96 MB, with 9 GB data disk (with mirroring). LSF central queues. n Server running NT Terminal Server Beta 2 • 1 dual Ppro @ 200 MHz, 128 MB, with 4 GB data disk. Runs IIS 3. 0 and is accessible from outside CERN. It also host the asp’s for Web access n Servers running NT 4. 0 Workstation SP 3 • 9 dual Ppro’s @ 200 MHz, 64 MB, 2*4 GB • 25 dual PII’s @ 300 MHz, 128 MB, 2*4 GB All equipped with boot proms Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 13
CERN - European Laboratory for Particle Physics PCSF Configuration (2) n n n Machines interconnected with 4 3 com 3000 100 Base. T switch Display/Keyboard/Mouse connected to a Raritan multiplexor PC Duo for remote admin access There were problems with other products n All running LSF 3. 0. LSF 3. 2 does not work, support weak n Completely integrated with NICE Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 14
CERN - European Laboratory for Particle Physics Applications on PCSF n n n ATLAS Dice simulation NA 45 1996 reconstruction CMS reconstruction with Objectivity being tested LHCB simulation code ready ATLAS reconstruction being ported ATLAS/Marseille event filter prototype scalability tests 15
CERN - European Laboratory for Particle Physics Data access Unix RFIO Server IO F R NT PC NT NTPC PC N et w or k Unix Tape Server stagexxx commands Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 16
CERN - European Laboratory for Particle Physics ATLAS Level 3 DAQ Readout Buffers Event Builder SFI l l l 1 GB/s SFI l l l Processor Farm Storage (100 MB/s) 17
CERN - European Laboratory for Particle Physics ATLAS Event Filter n n n Testbed for evaluating algorithms & sizing Architecture & simulation studies Monitoring, system management, feedback, etc… Interface prototypes (SFI, SFO) Timescale : prototype -1 (I. e. end 98) Status : sizing of an initial farm 18
CERN - European Laboratory for Particle Physics PCSF Usage Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 19
CERN - European Laboratory for Particle Physics Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 20
CERN - European Laboratory for Particle Physics Specific work so far n n n Installation (Remote Boot, Winstall, NICE replica’s, Install Server) User codes, CERNLIB, SHIFT Job Starter PC MGR WNTS Web Interface Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 21
CERN - European Laboratory for Particle Physics Installation Disk cloning + change SID Fastest method, but not very automated n Remote boot n • Remote boot install procedures with virtual disk • Use unattended setup, installs Winstall and other things • Third party packages installed through Winstall boot prom support on some hardware Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 22
CERN - European Laboratory for Particle Physics Porting Usually porting code from Unix to NT is easy (NA 45 code ported in 1 week) n Usually porting production environment from Unix to NT is difficult (shell scripts) n Porting build environment is difficult, better to use native tools (Dev Studio) Mixing Unix and NT build environment, revision control, etc. n Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 23
CERN - European Laboratory for Particle Physics Jobstarter Initially inherited from Unix LSF CERN Job. Starter n Rewritten in C++, using Pc. Mgr. Svc for drive mapping n Check execution preconditions n Clean up normal and abnormal job end n Kill popup dialog windows Excel & Winzip in batch n Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 24
CERN - European Laboratory for Particle Physics Pc. Mgr. Svc/Ctl n Checks • Status of monitored processes/services • Amount of scratch space • Drive mapping(s) n n Map/Unmap drives Sync. with time servers Generate alarms on request Gets all parameters from registry Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 25
CERN - European Laboratory for Particle Physics Web Interface n As a solution to • Remote access from outside CERN • Access from non NT hosts n n Implemented as ASP’s with VB Requires IIS on the server Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 26
CERN - European Laboratory for Particle Physics Web Interface - authentication Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 27
CERN - European Laboratory for Particle Physics Web Interface - Overview Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 28
CERN - European Laboratory for Particle Physics Web Interface - bjobs Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 29
CERN - European Laboratory for Particle Physics Web interface - bjobs result Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 30
CERN - European Laboratory for Particle Physics Windows NT Terminal Server Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 31
CERN - European Laboratory for Particle Physics Next Steps n n n n Finish and understand remote boot issues Complete remote boot - remote install AFS Integration Build up resilience Investigate how to use the new Wf. M, DMI, PXE, ACPI, etc. initiatives Investigate whether WSH is an alternative Investigate NT’s I/O capabilities Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 32
CERN - European Laboratory for Particle Physics Key Issues n n n n AFS access LSF support Boot proms, equipment interoperability CODE reintegration (Physics & CERNLIB) Think Windows Scalability & Management (home grown solution vs. commercial apps. ) Remote & external access Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 33
CERN - European Laboratory for Particle Physics PC with NT n n PC+NT has proven to work in batch environment, and is now an option for Physics Data Processing Farm management is less of a concern after have built a few tools (alternatives would be to use SMS or TNG), but some work is still needed Scalability has started to be addressed, but the relatively small number of nodes does not help here Considerable NT experience has been gained Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 34
CERN - European Laboratory for Particle Physics Issues so far n Linux • • n EEPRO 100 B MP support Commercial software Manufacturer support Very few local Linux experts NT • AFS access • LSF support • Think Windows • Remote and external access n PC • Interoperability (cards/MB combination • Remote Boot support Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 35
CERN - European Laboratory for Particle Physics PC Technology evolution in 97 n Pentium Pro Pentium II • 50 % raw performance increase • but 50 % cache performance reduction n n SEC new motherboards 440 FX 440 LX (SDRAM, AGP) Recent MB’s embedded SCSI, E’net, VGA 100 Mbit E’net switches standard, 1000 Mbit arriving Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 36
CERN - European Laboratory for Particle Physics PC Technology evolution in 98 n Pentium II @300 MHz Pentium Xeon @ 450 MHz • MP support • 50 % cache performance increase n n Slot 2 new motherboards 440 LX 440 BX, 440 NX (100 MHz, EDO) Recent MB’s No more available through Intel, TYAN 1000 Mbit/s E’net switches standard, >> 1000 Mbit/s arriving Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 37
CERN - European Laboratory for Particle Physics Racking evolution 1998 1997 Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 38
CERN - European Laboratory for Particle Physics Fast Ethernet Switches (Oct. 98) Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 42
CERN - European Laboratory for Particle Physics At the back of Fast Ethernet Switches (Oct. 98) Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 43
CERN - European Laboratory for Particle Physics Gigabit Ethernet Switches Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 44
CERN - European Laboratory for Particle Physics Network performance: Results PC’s interconnected through 100 Base. T 3 Com 3000 switch n Repeated with other H/W n Half duplex behavior n Block size does not matter n Linux uses less CPU than NT Good unidirectional performance Disappointing CPU consumption on NT Disappointing bi-directional performance n Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 45
CERN - European Laboratory for Particle Physics PC to PC Network performance Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 46
CERN - European Laboratory for Particle Physics Network performance: issues n n n Unexplained 0. 5 MB/s observed with some eepro 100 versions on PCRD hardware, but OK on PCSF Recent DEC E'net boards with chipset > 21140 give poor performance on Linux Surprising results PC/Alpha Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 47
CERN - European Laboratory for Particle Physics PC/Alpha Network performance Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 48
CERN - European Laboratory for Particle Physics PC High Performance Networking n n n Hi. PPI (5/98) PII, 300 MHz, 440 LX, SDRAM, Roadrunner to SGI O 2000, 4 CPU, IRIX 6. 4 Transmit: 50 MB/s Receive: 50 MB/s (53 MB/s with SMP) Frédéric Hemmer CERN-IT/PDP Gigabit Ethernet (10/98) n PII, 400 MHz, 440 BX, 100 MHz SDRAM, PCI 32/33, Tigon I n 1500 bytes/packet: 28 MB/s, 40% CPU n 9000 bytes/packet, 90 MB/s, 90% CPU DESY November 2, 1998 49
CERN - European Laboratory for Particle Physics Disk performance n n n PC’s connected to SEAGATE ST 19171 W using two Adaptec 2940 UW NT needs a lot of tuning (default behavior is to swap data out!) Block size, BIOS settings, EDO/FPM does not matter Poor performance Windows NT even worse Memory bandwidth is suspected Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 50
CERN - European Laboratory for Particle Physics Disk performance • Striping has no effect • 1 stream 2 stripes : 21 MB/s (22 max) • 1 stream 3 stripes : 21 MB/s (33 max) Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 51
CERN - European Laboratory for Particle Physics Disk performance: issues n n Memory bandwidth suspected Need to test with LX/SDRAM, BX SDRAM@100 Mhz RISC PCI does not support variety of boards Combined disk/network performance even worse : 5 -6 MB/s on Linux Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 52
CERN - European Laboratory for Particle Physics Memory bandwidth (lmbench) Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 53
CERN - European Laboratory for Particle Physics Memory bandwidth (lmbench) Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 54
CERN - European Laboratory for Particle Physics Technology issues n Technology evolves too fast (processors, chipsets, memory, motherboards, networking, . . . ) • Changing environment/interoperability issues • Hard to maintain (obsolescence) • New NIC’s, drivers • Measurements valid only a few months Difficult to establish stable environments n Wide variety of solutions Some combinations work, other not n Local suppliers cannot help to solve problems Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 55
CERN - European Laboratory for Particle Physics PC Performance summary n n CPU performance fine Network performance • Some configurations do not work • Some configurations can saturate Fast Ethernet • Recent tests show excellent performance n Memory performance • Now better than low-end RISC n n Disk Performance disappointing Linux better than NT Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 56
CERN - European Laboratory for Particle Physics Online and quasi online farms n n NA 48 Data Recording NA 45 Data Recording in Objectivity Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 57
Sub detector VME crates Event Builder Online PC Farm Cisco 5505 FDDI Fast Ethernet SUN E 450 500 GB Disk space XLNT Gbit Fast Ethernet Offline PC Farm 7 KM 3 Com 9300 3 Com 3900 CERN - European Laboratory for Particle Physics NA 48 Central Data Recording Gigabit Ethernet Giga. Router Hi. PPI FDDI CS/2 2. 5 TB Disk space Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 58
CERN - European Laboratory for Particle Physics NA 48 Data Recording in 98 n n May September 1998 Raw Data on Tape • 68 TB (1450 tapes, mainly 50 GB tapes) • 12. 5 TB Selected Reconstructed Data • Total with 97 data : 96 TB n n n Average Data Rate : 18 MB/s (peaks @ 23 MB/s) CDR system can do 40 -50 MB/s; limitation is CPU Time available Data recorded as files (4 million) Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 59
CERN - European Laboratory for Particle Physics NA 48 On Line Farm n n n n 11 Subdetector PC’s (dual PII-266, 128 MB) 8 Event Building PC’s (dual PII-266, 128 MB, 18 GB SCSI) 4 CDR routing PC’s (dual PII-266, 64 MB, FDDI) All running Linux Software event building in the interburst gap Optional Software Filter (tags data) Send data to computer center (local disk buffers : 144 GB , 2 hours) On CS/2 : L 3 Filtering and tape writing Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 60
Sub detector VME crates Event Builder Cisco 5505 Fast Ethernet Gigabit Ethernet 4 * SUN E 450 4. 5 TB Disk space 7 KM Fast Ethernet 3 Com 3900 CERN - European Laboratory for Particle Physics NA 48 Plans for 1999 3 Com 9300 Hi. PPI Gigabit Ethernet On/Offline PC Farm Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 61
Sub detector VME crates NA 48 SCI Fast Ethernet 3 Com 3900 Gigabit Ethernet Event Builder On Line PC Farm PCSF 7 KM 2 * SUN E 450 500 GB Disk space Fast Ethernet 3 Com 3900 CERN - European Laboratory for Particle Physics NA 45 Data Recording 3 Com 9300 Hi. PPI Gigabit Ethernet Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 62
CERN - European Laboratory for Particle Physics NA 45 Raw Data recording in Objectivity n n n n October 98 ; November 98 Estimated bandwidth : 15 MB/s Processes translate Raw Data format to Objectivity Database files (1. 5 GB) are closed, then written on tape Steering done using a set of perl scripts on the disk servers On line filtering/reconstruction/calibration possible Farm is running Windows NT Reconstruction can use PCSF Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 63
CERN - European Laboratory for Particle Physics Current & Future Data rates at CERN Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 64
CERN - European Laboratory for Particle Physics Summary n n n On line PC farms are being used to record data at sensible rates (Linux) Off line PC farms are being used for reconstruction/filtering/analysis (Linux/NT) Still a lot to do on scalable farm management, global steering, CDR monitoring, etc. . Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 65
CERN - European Laboratory for Particle Physics PC Total Cost of Ownership • Software not included • Install labor not included • Assumes 3 years lifetime Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 66
CERN - European Laboratory for Particle Physics DEC 8400 (12 -Way) Cost of Ownership • Software & SW maintenance not included • Assumes 5 years lifetime Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 67
CERN - European Laboratory for Particle Physics General Conclusions (1) n n n PC’s are now used for online, quasi online and offline environments The “offline” is now part of the online The I/O is still done using RISC/Unix but recent MP Xeon may change this … Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 68
CERN - European Laboratory for Particle Physics General Conclusions (2) n PC technology is moving very fast • Good for performance • Not so for stability, interoperability • Not so for understanding issues n The general management of large farms is not solved but … • Number of initiatives/standards/tools may help us here : Wf. M, DMI, PXE, ACPI, SMS, TNG, etc. Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 69
CERN - European Laboratory for Particle Physics General Conclusions (3) n Linux vs. NT … the battle is over • Choose the one suitable to your application • NT can be used • Linux is usable (and offers more performance). n PC real costs are usually not well understood Frédéric Hemmer CERN-IT/PDP DESY November 2, 1998 70
- Slides: 67