RAL Tier 1A Report Hep Sys Man July
- Slides: 21
RAL Tier 1/A Report Hep. Sys. Man - July 2004 Martin Bly / Andrew Sansum Martin Bly RAL Tier 1/A
Overview • • Hardware Network Experiences / Challenges Management issues 1/2 July 2004 Martin Bly RAL Tier 1/A 2
Tier 1 in GRIDPP 2 (2004 -2007) • The Tier-1 Centre will provide GRIDPP 2 with a large computing resource of a scale and quality that can be categorised as an LCG Regional Computing Centre • January 2004 – GRIDPP 2 confirm RAL to be host for Tier 1 Service – GRIDPP 2 to commence September 2004 • Tier 1 Hardware budget: – £ 2. 3 M over 3 years • Staff – Increase from 12. 1 to 16. 5 by September 1/2 July 2004 Martin Bly RAL Tier 1/A 3
Current Tier 1 Hardware • • • CPU – 350 dual Processor Intel – PIII and Xeon servers mainly rack mounts – About 400 KSI 2 K – Red. Hat 7. 3 – P 2/450 tower units decommissioned April 04 – RH 72 and Solaris batch services to be phased out this year Disk Service – mainly “standard” configuration – Dual Processor Server – Dual channel SCSI interconnect – External IDE/SCSI RAID arrays (Accusys and Infortrend) – ATA drives (mainly Maxtor) – About 80 TB disk – Cheap and (fairly) cheerful Tape Service – STK Powderhorn 9310 silo with 8 9940 B drives 1/2 July 2004 Martin Bly RAL Tier 1/A 4
New Hardware • 256 x dual Xeon HT/2800 GHz@533 MHz – 2 GB RAM (32 with 4 GB RAM), 120 GB HDD, 1 Gb NIC: 8 racks. • 20 disk servers with two 4 TB IDE/SCSI arrays: 5 racks – Infortrend Eon. Store A 16 U-G 1 A units, each with 16 x WD 250 GB SATA HDD – 4 TB/array raw capacity – Servers: dual Xeon HT/2800@533 MHz, 2 GB RAM, dual 120 GB SATA system disks, dual 1 Gb/s NIC – 160 Tb raw, ~140 TB available (RAID 5) • Delivered June 15 th, now running commissioning tests 1/2 July 2004 Martin Bly RAL Tier 1/A 5
1/2 July 2004 Martin Bly RAL Tier 1/A 6
Next Procurement • Need in production by January 2005 – Original schedule of December delivery seems late – Will have to start very soon – Less chance for testing / new technology • Exact proportions not agreed, but … – 400 KSI 2 K (300 -400 CPUs) – 160 TB disk – 120 TB tape? ? – Network infrastructure? – Core servers (H/A? ? ) – Red. Hat? • Long range plan needs reviewing – also need long range experiment requirements so as to plan environment updates. 1/2 July 2004 Martin Bly RAL Tier 1/A 7
CPU Capacity 1/2 July 2004 Martin Bly RAL Tier 1/A 8
Tier 1 Disk Capacity (TB) 1/2 July 2004 Martin Bly RAL Tier 1/A 9
High Impact Systems • Looking at replacement hardware for high impact systems: – /home/csf, /rutherford file systems – Mysql servers – AFS cell – Front end / UI hosts – Data movers – NIS master, Mail server • Replacing mix of Solaris, Tru 64 Unix and AIX servers with Linux – consolidation of expertise • Migrate AFS to Open. AFS and then K 5. 1/2 July 2004 Martin Bly RAL Tier 1/A 10
Network Superjanet Test network (eg MBNG) Production VLAN Firewall Rest of Site Server Site Router Production Subnet Test Subnet Site Routable Network Test VLAN Servers Workers Servers 1/2 July 2004 Workers Martin Bly RAL Tier 1/A 11
Network Super. Janet Test network (eg MBNG) Production VLAN Firewall Rest of Site Router Server Test VLAN Tier 1 Network Workers Servers 1/2 July 2004 Servers Workers Martin Bly RAL Tier 1/A 12
UKlight • Connection to RAL in September • Funded to end 2005 after which probably merges with Super. Janet 5 • 2. 5 Gb/s now 10 Gb/s from 2006 • Effectively dedicated light path to CERN • Probably not for Tier 1 production but suitable for LCG Data challenges etc, building experience for Super. Janet upgrade. • UKLight -> Starlight 1/2 July 2004 Martin Bly RAL Tier 1/A 13
Forthcoming Challenges • • • Simplify service – less “duplication” Improve storage management Deploy new Fabric Management Red. Hat Enterprise 3 upgrade Network upgrade/reconfigure? ? Another procurement/install Meet challenge of LCG – professionalism LCG Data Challenges … 1/2 July 2004 Martin Bly RAL Tier 1/A 14
Clean up Spaghetti Diagram • How to phase out “Classic” service. . • Simplify Interfaces: Less GRIDS “More is not always better” 1/2 July 2004 Martin Bly RAL Tier 1/A 15
Storage: Plus and Minus • • • ATA and SATA drives External RAID arrays SCSI interconnect Ext 2 file system Linux O/S • NFS/Xrootd/http/gridftp/bbftp/srb/…. • • • NO SAN No management layer NO HSM 1/2 July 2004 • • • 2. 5% failure per annum - OK Good architecture, choose well Surprisingly unreliable: change OK – but need journal: XFS? Move to Enterprise 3 Must have SRM Need SAN (Fibre or i. SCSI …) Need virtualisation/DCACHE. . ? ? Martin Bly RAL Tier 1/A 16
Benchmarking • Work by George Prassas on various systems including a 3 ware/SATA RAID 5 system. • Tuning gains extra performance on RH variants • Performance of RHEL 3 NFS servers and disk I/O not special despite tuning, c/w RH 73 • Considering buying SPEC suite to benchmark everything. 1/2 July 2004 Martin Bly RAL Tier 1/A 17
Fabric Management • Currently run: – Kickstart – cascading config files, implementing PXE – SURE exception monitoring – Automate – automatic interventions • Running out of steam with old systems … – “Only” 800 systems – but many, many flavours – Evaluating Quator – no obvious alternatives – probably deploy – Less convinced by Lemon – bit early – running Nagios in parallel 1/2 July 2004 Martin Bly RAL Tier 1/A 18
Yum / Yumit • Kickstart scripts now use Yum to bootstrap systems to latest updates • Post-install config now uses Yum wherever possible for local additions • Yumit: – Nodes use Yum to check their status very night and report to central database – Web interface to show farm status – Easy to see which nodes need updating. • Machine ownership tagging, port monitoring project 1/2 July 2004 Martin Bly RAL Tier 1/A 19
Futures • Storage Architectures – i. SCSI, Fibre, d. Cache – Need to be more sophisticated to allow reallocation of available space • CPUs – Xeon, Opteron, Itanium, Intel 64 bit x 86 architecture • Network – Higher speed interconnect, i. SCSI 1/2 July 2004 Martin Bly RAL Tier 1/A 20
Conclusions • After several years of relative stability must start reengineering many Tier 1 components. • Must start to rationalise – support limited set of interfaces, operating systems, testbeds … simplify so we can do less better • LCG becoming a big driver – Service commitments – Increase resilience and availability – Data challenges and move to steady state • Major reality check in 2007! 1/2 July 2004 Martin Bly RAL Tier 1/A 21
- 3 tier vocabulary
- Tiered vocabulary pyramid
- Man vs. society conflict examples
- Old man ral
- Helen mc
- Www hep verlag elehrmittel anleitung
- Hep b vaccine schedule for adults
- Chronic hepatitis
- Hep c results interpretation
- Hcv symptoms female
- Forum lhc
- Nucloplasm
- Hep obnovljivi izvori energije
- Hep b vaccines
- Www.cdc.gov/vaccines/schedules/index.html
- Hep c symptoms
- Termoelektrana plomin
- Liverpool hep c
- Hep international
- Ian ral
- Is harris burdick a real person
- Captain tory picture