Fermi Linux Server Vendor Qualification HEPi X May

  • Slides: 29
Download presentation
Fermi Linux Server Vendor Qualification HEPi. X May 21, 2003 Steven C. Timm For

Fermi Linux Server Vendor Qualification HEPi. X May 21, 2003 Steven C. Timm For the Fermi Linux Vendor Qualification Taskforce 1 21 May 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal. gov

OUTLINE l l 2 Fermilab Hardware Procurement Strategy Goals of Qualification Procedures of Qualification

OUTLINE l l 2 Fermilab Hardware Procurement Strategy Goals of Qualification Procedures of Qualification Results of Qualification Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal. gov 21 May 2003

SUMMARY l l l 3 The 2003 Fermi Linux Server Vendor Qualification focused on

SUMMARY l l l 3 The 2003 Fermi Linux Server Vendor Qualification focused on 1 U Intel servers. First phase was a technical evaluation which identified 18 technically qualified vendors. All these vendors participate in a price-performance bid—the top five make the vendor list. (Currently ongoing). We remember all technically qualified vendors and rotate them in as necessary. We are not making a new qualified desktop vendor list at this time Public web page: http: //www-oss. fnal. gov/scs/public/qualify 2003 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal. gov 21 May 2003

Members of Fermi Linux Server Vendor Qualification Taskforce: The taskforce involved personnel from five

Members of Fermi Linux Server Vendor Qualification Taskforce: The taskforce involved personnel from five different departments plus key members of management. All major purchasers of server hardware were represented. Also represented were the computer room logistics staff. Members: Steven Timm (chair), Margaret Greaney, Troy Dawson, Lance Weems, Hans Wenzel, Bruce Karrels, Don Holmgren, Phil Lutz, Stan Naymola, Mark Kaletka, Gerry Bellendir. 4 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal. gov 21 May 2003

Fermi Hardware Procurement Strategy l l l 5 Buy a hardware solution fully integrated

Fermi Hardware Procurement Strategy l l l 5 Buy a hardware solution fully integrated as possible, including installation Identify vendors that know Fermilab requirements and are willing to work with Fermi Linux. Replacement parts via 3 year warranty, service provided by Fermilab. Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal. gov 21 May 2003

Fermi Linux Vendor List--History l l l 6 Two previous Fermi Linux qualifications, 1999

Fermi Linux Vendor List--History l l l 6 Two previous Fermi Linux qualifications, 1999 and 2001. 1999—desktops as farm workers, 5 vendors 2001—separate vendor lists for desktops and 2 U rackmount servers Also two special evaluations for 2 U rackmounts and AMD. Vendor list used in all major Fermi acquisitions, ~1500 machines from 1999 -2002. Also used by outside groups: KEK, INFN, Server Vendor Northwestern, Fermi MIT, Linux Geneva, Carnegie Mellon, Qualification--Steven Timm Pittsburgh, Edinburgh, others timm@fnal. gov 21 May 2003

Evaluation: performance/price l l 7 Overriding goal has been to get the best performance

Evaluation: performance/price l l 7 Overriding goal has been to get the best performance possible at the lowest price. We have succeeded well—From 1999 to 2002 Fermi cycles per dollar increased by a factor of 6—Moore’s law should have only given us a factor of four. Users are happy with quantity of computing that they got for their money. But still, in this evaluation, we are looking for better long term reliability, not race to the bottom for price only. Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal. gov 21 May 2003

Evaluation: Performance/price l l l 8 Problem: One node not the best test of

Evaluation: Performance/price l l l 8 Problem: One node not the best test of long-term price/performance by a company. Small businesses best able to take time to follow directions of evaluation process and give support. Small businesses not always able to deliver large orders in timely manner with good initial quality. Single node prices not a good predictor of bid level on a real bid—and we shouldn’t be asking anyway. Address by: getting technical qualification done first, then doing a price/performance bid. Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal. gov 21 May 2003

Evaluation: Vendor attrition l l 9 Some vendors on list have gone out of

Evaluation: Vendor attrition l l 9 Some vendors on list have gone out of business Others disqualified for bad performance Others stopped bidding on their own, or bid ridiculously high Address by: – Select vendor list on performance/price basis from all those technically qualified. – Keeping track of all technically qualified vendors, add to list if necessary – Supplement list if special hardware (AMD, blades, desktop) required. Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal. gov 21 May 2003

Evaluation: Initial quality l l l 10 Problem: Going too low on the price

Evaluation: Initial quality l l l 10 Problem: Going too low on the price curve: Sometimes vendors bid too low and try to deliver poor quality systems Addressed, from the beginning, with tough 30 -day acceptance test and “lemon law” In various cases Fermilab has required vendors to do swaps on all units of PS, case, motherboard, disk drives, and racks. Cost of Fermi labor to resolve the problem less than difference between the winning bid and the next highest bid. All issues have been resolved through this process and the systems have all had productive lives. NOW—also address with references and hard numbers on initial quality. Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal. gov 21 May 2003

Evaluation: Components l l l l 11 Problem: Rapidly changing components In commodity market,

Evaluation: Components l l l l 11 Problem: Rapidly changing components In commodity market, components change rapidly. From beginning of eval to issuance of purchase order—about six months CPU speeds go up, cases change. Impossible to track for laptop, difficult to track for desktop. OK for server market but results in higher heat loads and current draws. ADDRESS by thermal specs that are broad enough so that if there are problems, vendor still has to fix. Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal. gov 21 May 2003

Goals l We want to identify vendors who are best capable to deliver rackmounted

Goals l We want to identify vendors who are best capable to deliver rackmounted solutions – – – 12 Competent in Linux Build quality 1 U Servers Can integrate into rackmount environment with good thermals in a timely and professional manner Have high performance Have good support and troubleshooting Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal. gov 21 May 2003

Vendor Selection l l l 13 Existing vendors on Fermi Linux list Sales to

Vendor Selection l l l 13 Existing vendors on Fermi Linux list Sales to other Fermi Departments Advertisements at trade shows Survey of other DOE labs at HEPi. X Vendor’s direct contact to Fermilab asking to participate. Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal. gov 21 May 2003

Chronology l l l 14 We made contact with 45 vendors in all. 29

Chronology l l l 14 We made contact with 45 vendors in all. 29 vendors attended Jan 28. info meeting 24 vendors submitted acceptable configuration on Feb. 4 21 vendors submitted acceptable benchmarks and were cleared to ship unit on Mar. 4—all got it here by Mar 11. 18 vendors identified as technically qualified Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal. gov 21 May 2003

Specifications l l l l 15 1 U Dual Intel Xeon, 2. 4 GHz

Specifications l l l l 15 1 U Dual Intel Xeon, 2. 4 GHz or faster 400 MHz front side bus or faster 1 GB RAM (RDRAM or DDR SDRAM) Disks: 1 20 Gb system 2 x 40 Gb data 100 Mbit Ethernet Video CDROM, Floppy Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal. gov 21 May 2003

Why just 1 U Xeon l l l 16 AMD hardware shows high initial

Why just 1 U Xeon l l l 16 AMD hardware shows high initial failure rate, high current, high heat. 1 U is most challenging thermal case…if they can build 1 U we believe they can build 2 U. Intel chips are supposed to be faster than AMD at the moment Intel chips supposed to run cooler, draw less current. Simplicity—a platform we already mostly understand, just one from each vendor Space—we don’t have space to put so many 2 U. Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal. gov 21 May 2003

Linux Competence l l l 17 Vendor identifies hardware that’s compatible with Linux. (Much

Linux Competence l l l 17 Vendor identifies hardware that’s compatible with Linux. (Much easier than it used to be). Vendor loads Fermi Linux onto evaluation node Have to configure lm_sensors on the node Runs our supplied test to check and see if they did it right. They are only allowed to ship the unit to Fermilab if it is right. Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal. gov 21 May 2003

Electrical l l 18 Electric current measured with ammeter at startup, idle, and full

Electrical l l 18 Electric current measured with ammeter at startup, idle, and full CPU load. Current draw ranges: 2. 4 GHz, 1. 6 -2. 0 A, 2. 8 GHz, 2. 0 -2. 3 A, 3. 06 GHz, 2. 1 -2. 35 A Likely that with purchase of 2. 8 or 3. 06 GHz machines we can only have seven machines per circuit, not eight as in the past. Those with higher current draw also tend to have more fans and be better internally cooled. Bright side—This current similar to 750 MHz machines bought 3 years ago, 2. 5 x the performance for the same current. Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal. gov 21 May 2003

Thermal l l l 19 Measured T from front to back of unit for

Thermal l l l 19 Measured T from front to back of unit for all. Used internal temperature probes on each unique type of case. All units in evaluation much cooler than the 1 U units bought in FY 2002. Due to better thermal characteristics of Intel chip and many more added internal fans and blowers. “Northbridge” chipset chips in some machines ran hotter than the CPU’s. Important to watch size of heatsink on these chips. Still analyzing the data we took but confident that all units are acceptable. Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal. gov 21 May 2003

Thermals continued 20 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal. gov 21 May 2003

Thermals continued 20 Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal. gov 21 May 2003

Quality 1 U Servers l l l 21 Open each machine to verify quality

Quality 1 U Servers l l l 21 Open each machine to verify quality of construction Run burn-in on each machine for two weeks Thermal measurements in real rack situation Electrical current measurements Verify all components meet specs. Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal. gov 21 May 2003

Integration capabilities contd. l l 22 Vendors are asked to submit sample proposal for

Integration capabilities contd. l l 22 Vendors are asked to submit sample proposal for full rack of systems Standard Fermi rack configuration is base of proposal but they can suggest extras. Goal is to (1) learn if they can integrate and (2) get new ideas on how to improve our setup. Also they must submit info on clusters they have installed before, with real temperature and reliability numbers. Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal. gov 21 May 2003

Performance l l l 23 Vendors are supplied CD-ROM of CDF and D 0

Performance l l l 23 Vendors are supplied CD-ROM of CDF and D 0 Benchmark Performance measured in Fermi Cycles where PIII 1 GHz=1000 Fermi Cycles. We repeat test when machine gets here QCD benchmark, seti@home, tiny also run. Would be ideal to use SPEC CPU 2000—but published results not repeatable with compilers used by Fermi. Price doesn’t enter in technical evaluation. Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal. gov 21 May 2003

Performance l l l 24 3 CPU speeds measured, 2. 4, 2. 8, 3.

Performance l l l 24 3 CPU speeds measured, 2. 4, 2. 8, 3. 06 GHZ, 1000 Fermi. Cycles=PIII 1 GHz. Average performance, 1779, 2041, 2223 Fermi Cycles respectively. 400 MHZ vs 533 MHz front side bus is 2. 5% effect for farms software, much bigger for QCD. AMD MP 2200+ --1771 Fermi Cycles Performance is projected to faster clock speeds in anticipation that some vendors will bid faster chips. Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal. gov 21 May 2003

Support and Troubleshooting l l 25 Each vendor gets software call—related to the configuration

Support and Troubleshooting l l 25 Each vendor gets software call—related to the configuration of Fermi Linux, solvable by E-mail or phone Each vendor gets hardware call—designed to trigger an on-site service call. We manufacture one if necessary. Points for prompt response, correct response. Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal. gov 21 May 2003

Conclusions l l 26 18 technically qualified vendors—in alphabetical order Ace, Angstrom, APPRO, ASA,

Conclusions l l 26 18 technically qualified vendors—in alphabetical order Ace, Angstrom, APPRO, ASA, Aspen, Atipa, Concentric, Dell, HP, IBM, Koi, Penguin, Promicro, PSSC, Rackable, Racksaver, Richardson, Western Scientific Price/performance bid will weed them down to five. 21 vendors is too many to bring in, will be more discriminating next time. Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal. gov 21 May 2003

Component issues: l l 27 Boards OK: Intel SE 7501 series, Supermicro X 5

Component issues: l l 27 Boards OK: Intel SE 7501 series, Supermicro X 5 DPx series, Tyan 2721, Tyan 2723 Both Tyan S 2721 -533 (Thunder i 7501 Pro) and Tyan S 2723 (Tiger i 7501) had issues with 10/100 ethernet…resolved by changing resistor value on the board Some manufacturers offer cold-swap and hot-swap capabilities on drives, very nice. Issues in Intel E 7501 chipset—slower disk throughput than some earlier chipsets, but adequate for our needs. Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal. gov 21 May 2003

Price/performance bid l l l 28 All vendors who pass our technical requirements are

Price/performance bid l l l 28 All vendors who pass our technical requirements are participating in a price/performance bid on a small number of nodes (48) Top five will be the Fermi Linux Qualified Vendors We will keep track of all technically qualified vendors to replenish the list if – A vendor goes out of business – A vendor stops bidding, or bids consistently very high on Fermi RFP’s – A particular RFP requires special capacities—Myrinet, AMD, blade servers, desktop Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal. gov 21 May 2003

Future Plans l Blade server evaluation coming up. – – 29 Requires change in

Future Plans l Blade server evaluation coming up. – – 29 Requires change in install philosophy…no floppy, CDROM, serial console available. Essential to address power and space concerns in Feynman and elsewhere. Fermi Linux Server Vendor Qualification--Steven Timm timm@fnal. gov 21 May 2003