Jefferson Lab Site Report Kelvin Edwards Thomas Jefferson
Jefferson Lab Site Report Kelvin Edwards Thomas Jefferson National Accelerator Facility Newport News, Virginia USA Kelvin. Edwards@jlab. org 757 -269 -7770 http: //cc. jlab. org HEPi. X – October, 2004
Central Computing • Email – Distracted by SPAM problem – Evaluated and purchased MXLogic • Offsite solution • Filters virus/spam before getting to Lab – Upgraded our email hardware • Windows builds – – Purchased MS Enterprise Agreement Developed an automatic build process Upgrading all of our systems to Windows XP Still evaluating SP 2, problems with CAD, etc.
File Server Storage • Adaptec 2200 S Raid and Linux XFS – Linux kernel 2. 6 and Adaptec firmware (build 7244) • It doesn’t work (I/O errors, etc. ) – Red. Hat EL 3 WS kernel works fine, but no XFS support – Tested ext 3 performance • unacceptable (20 MB/s read, 34 MB/s write) • XFS performance (approx 100 MB/s read/write) – Dropped back to prior Adaptec BIOS and 2. 6 kernel works fine
File Server Storage (cont) • Purchased 2 Storage. Tek B 280 systems – 14 TB of disk space – 4 Sun V 210 head units – Stable, but slow, NFS performance • Aggregate -- 6 MB/s write, 63 MB/s read • Each node -- 0. 13 MB/s write, 1. 4 MB/s read average
File Server Storage (cont) • Evaluating 10 TB Panasas system – – Tested 2 protocols (direct. FLOW and NFS) No direct. FLOW problems NFS finally stable at version 2. 1. 4 c Good performance with either • Aggregate -- 160 -185 MB/s write, 100 -180 MB/s read • Each node – 3. 5 - 5 MB/s write, 2. 5 - 4. 5 MB/s read
Jasmine Changes • Jasmine is Jlab’s mass storage system (disk+tape) stores ~1 PB and can routinely move 20 TB/day. • Disk cache system recently rewritten for performance and reliability – I/O load spread out over pool of many disk servers – Files belong to file groups (per experiment) with quotas – Quotas may be exceeded if there is enough disk space; allows more flexible use of disk – Files deleted from servers in a modified LRU fashion – Files may be pinned until used by the batch farm
Jasmine changes (2) • New programmatic interfaces for – Batch Farm (Auger) – Other services that need to move files (SRM, DAQ, LQCD disk cache) • More reliance on My. SQL database; concurrency and load are challenging • Writing 9940 B tapes • Experiment data rates now ~30 MB/sec
Auger Changes • Auger is Jlab’s Batch farm management system. • Uses LSF to run jobs, keeps accounting in a database for web or command line presentation. • Users can submit thousands of jobs using a compact job description that includes file retrieval and storage. • Interfaces with Jasmine to stage files to disk before the job runs on the farm to keep CPUs busy
Jasmine & Auger Web Interface • Java Server Pages
Projects • Email upgrade – Still evaluating software/hardware • Desktop systems – Mac. OS-X – Linux, Unix – Windows • Power/Cooling issues – Reached limit of current Computer Room – New Computer Center to open in Jan 2006 – Increased power requirements for 800 MHz FSB systems • 1. 3 A to 2. 1 A (single CPU) • 1. 6 A to 2. 8 A (dual CPU) – Shutdown problems with non-ACPI enabled systems
- Slides: 10