An Operational Parallel Weather Prediction System Scicom P

  • Slides: 21
Download presentation
An Operational Parallel Weather Prediction System Scicom. P 5 May 2002 Mats Hamrud David

An Operational Parallel Weather Prediction System Scicom. P 5 May 2002 Mats Hamrud David Dent Scicom. P 5 Slide 1 ECMWF

ECMWF l European Centre for Medium-range Weather Forecasts l A European organisation with headquarters

ECMWF l European Centre for Medium-range Weather Forecasts l A European organisation with headquarters in the UK l 17 Member States and 5 Co-operating States l Principal objectives: - development of methods forecasting weather beyond two days ahead - collection and storage of appropriate meteorological data - daily production and distribution of forecasts to the Member States - provision of archival/retrieval facilities to the Member States - provision of computer resources to the Member States l Staff of about 200 Scicom. P 5 Slide 2 ECMWF

Operational forecasting system l Data assimilation - Four-dimensional variational system (4 D-Var) with a

Operational forecasting system l Data assimilation - Four-dimensional variational system (4 D-Var) with a 12 -hour data window - Horizontal resolution: 120 km for inner loops and 40 km for outer loops l Global atmospheric forecasts - Forecasts to 10 days at 40 km resolution - Ensemble forecasts to 10 days with 51 members at 80 km resolution l Ocean wave forecasts - Global forecasts to 10 days at 40 km resolution - European waters forecasts to 5 days at 15 km resolution l Seasonal forecasts - Global forecasts to 6 months (ensemble with 30 members) using coupled atmosphere ocean model Scicom. P 5 Slide 3 ECMWF

Planned improvements l Assimilation of cloud and rain data in 4 D-Var l Higher

Planned improvements l Assimilation of cloud and rain data in 4 D-Var l Higher vertical resolution in the free troposphere and lower stratosphere l Operational running of a Severe Weather forecasting system (based on ensemble predictions) l Assimilation of the very high resolution satellite data that will become available within the next few years l To implement the above plans, more computing power is required, in particular for the: - Data Assimilation System (4 D-Var) - Ensemble Prediction System (EPS) Scicom. P 5 Slide 4 ECMWF

Computing at ECMWF l Numerical weather modelling depends heavily on highperformance computing resources l

Computing at ECMWF l Numerical weather modelling depends heavily on highperformance computing resources l ECMWF used Cray machines from 1977 to 1996 (Cray-1 A, XMP, YMP, C 90, T 3 D) l Production work moved to distributed memory machines in 1996 (Fujitsu VPP 700, VPP 700 E, VPP 5000) l Forecasts need to be produced 365 days a year, timeliness is very critical l Allocation of computing resources: - 25% Operational activities - 25% Member State users (throughout Europe) - 50% In-house research staff l Large meteorological archive for research activities Scicom. P 5 Slide 5 ECMWF

Computer configuration (January 2002) Scicom. P 5 Slide 6 ECMWF

Computer configuration (January 2002) Scicom. P 5 Slide 6 ECMWF

HPC procurement in 2001 l ECMWF asked for a phased introduction of resources covering

HPC procurement in 2001 l ECMWF asked for a phased introduction of resources covering the period 2003 -2006, with a parallel run during the second half of 2002 l Benchmark tests (for performance commitments) - 4 D-VAR (T 511/255 L 60; 2, 4 or 6 copies depending on PHASE) - DETERMINISTIC FORECAST (T 799 L 90; 2 or 4 copies) • - EPS (T 399 L 60; 50 or 100 copies) Percentages of peak: 4 D-Var did not scale as well as the deterministic forecast and the EPS Scalar machines: 3% to 12%, depending on test/product Vector machines: 14% to 38%, depending on test/product l Rule of thumb for ECMWF’s codes was confirmed: - ~10% for scalar architectures and ~30% for vector architectures Scicom. P 5 Slide 7 ECMWF

Outcome of the procurement l There was strong competition, with three highlycompetitive tenders in

Outcome of the procurement l There was strong competition, with three highlycompetitive tenders in contention right to the end l The offer from IBM was judged to be the best l The contract runs until 31 March 2007 and all of the equipment is on lease l Key features of the configuration: - Two identical clusters throughout the service period - Initially, p 690 servers logically partitioned into four 8 -CPU nodes - Later, p 690 follow-on servers with the Federation switch Scicom. P 5 Slide 8 ECMWF

Deliverables (“Blue Storm”) PHASE 1 PHASE 2 PHASE 3 Processor 1. 3 GHz Power

Deliverables (“Blue Storm”) PHASE 1 PHASE 2 PHASE 3 Processor 1. 3 GHz Power 4 Faster Power 4 Number of processors 1408 1920 ~3000 Dual-Colony (PCI) Dual-Colony (PCI-X) Federation I/O Nodes 8 NH-2 4 p 690 follow-on Disk space (Fibre Channel) 8. 4 TB 12. 4 TB 27 TB 1. 3 x VPPs 1. 9 x VPPs 5. 2 x VPPs Interconnect Sustained performance Scicom. P 5 Slide 9 ECMWF

Performance profile of the IBM solution relative to existing Fujitsu system 5 4 3

Performance profile of the IBM solution relative to existing Fujitsu system 5 4 3 Phase 3 2 1 Fujitsu 2002 Phase 2 2003 2004 2005 2006 Performance on ECMWF codes (Fujitsu = 400 GF sustained) Scicom. P 5 Slide 10 ECMWF

One of the two Phase 1 clusters p 690 32 CPU 128 GB Dual

One of the two Phase 1 clusters p 690 32 CPU 128 GB Dual plane switch NH-2 n 003 n 004 n 002 n 001 8 CPU n 005 -008 n 009 -012 Gigabit-eth public LAN n 013 -016 n 017 -020 cws 1 cws 2 n 021 -024 n 025 -028 10/100 eth public LAN 8 GB p 690 n 057 -060 n 053 -056 n 049 -052 n 045 -048 n 041 -044 n 037 -040 n 033 -036 n 029 -032 n 061 -064 n 065 -068 n 069 -072 n 073 -076 n 077 -080 n 081 -084 n 085 -088 n 089 -092 32 CPU 32 GB 10/100 eth internal LAN Some p 690 nodes will also have gigabit-ethernet connectivity Scicom. P 5 Slide 11 ECMWF

Timetable l Milestones - January: 1 standalone p 690 server 4 Nighthawk-2 nodes with

Timetable l Milestones - January: 1 standalone p 690 server 4 Nighthawk-2 nodes with a Colony switch - May: Switch adapter for the p 690 - June: Delivery of one cluster of Phase 1 - October: Start of acceptance of Phase 1 - March 2003: End of the Fujitsu service l Migration tasks - Codes are generally fit for the new architecture - In-depth optimisation has yet to done - Move from NQS to Load. Leveler will require re-thinking - GPFS is very different to the current Fujitsu I/O subsystem Scicom. P 5 Slide 12 ECMWF

IFS code overview l Developed jointly at ECMWF and Meteo France l More than

IFS code overview l Developed jointly at ECMWF and Meteo France l More than 10 years old l 3000 - 4000 routines, ~500. 000 lines of source code l Fortran 90 (~99%) and some c (no extensions) l Parallel using MPI and Open. MP l Message passing characterized by large messages l Rather flat execution profile, only important standard maths library routine is matrix multiply l Contains a high-level blocking scheme DO JL=1, NGPTOT, NPROMA CALL LOTS_OF_WORK(……. ) ENDDO Scicom. P 5 Slide 13 ECMWF

IFS blocking Scicom. P 5 Slide 14 ECMWF

IFS blocking Scicom. P 5 Slide 14 ECMWF

Comparison of sustained performance VPP 5000 IBM Phase 2 IBM Phase 3 measured prediction

Comparison of sustained performance VPP 5000 IBM Phase 2 IBM Phase 3 measured prediction 4 -D VAR 22. 9% 3. 0% 4. 5% Deterministic forecast 35. 5% 7. 0% 8. 9% Ensemble Forecast 30. 8% 11. 2% 12. 0% Scicom. P 5 Slide 15 ECMWF

Reasons for “low” percentage of peak l Inter-node communications e. g. : 4 -D

Reasons for “low” percentage of peak l Inter-node communications e. g. : 4 -D VAR on ~500 processors l Memory access l IFS code contains many divides SQRT EXP LOG real power Scicom. P 5 Slide 16 ECMWF

Ongoing migration work l Possibly unusual requirements at ECMWF - bit-wise reproducibility on different

Ongoing migration work l Possibly unusual requirements at ECMWF - bit-wise reproducibility on different number of processors => -O 3 -qstrict error trapping always on code must continue to run efficiently on vector machine (collaboration with Meteo France) different compiler levels installed simultaneously l Incorporate improvements made by IBM benchmark team in current operational version l Optimisation of message passing l Codes not part of the benchmark must be migrated l Evaluation of MASS library Scicom. P 5 Slide 17 ECMWF

Error trapping l Need to trap errors and provide traceback in operational environment l

Error trapping l Need to trap errors and provide traceback in operational environment l -qflttrap=overflow: zerodivide: invalid: enable: [imprecise] l Overheads too high on POWER 4 l Enable trapping by call in main program - sigaction : install signal handler for traceback [dump] - fp_trap : allow floating point exception trapping - fp_enable : enable floating point exception trapping - trap SIGFPE, SIGILL, SIGBUS, SIGSEGV, SIGXCPU l Overhead is only a few percent on forecast model l Possible problem with use of MASS library Scicom. P 5 Slide 18 ECMWF

Relative costs of computations function vpp 5000 p 690 ratio Relative performance multiply 16

Relative costs of computations function vpp 5000 p 690 ratio Relative performance multiply 16 2 8 1. 85 add 16 2 8 1. 85 Multiply& add 32 4 8 1. 85 divide 16/4 2/30 60 13. 85 SQRT 48/20 2/36 43 10 Scicom. P 5 Slide 19 ECMWF

Possible low level optimisations l Reduce number of divides, SQRT etc. l Try to

Possible low level optimisations l Reduce number of divides, SQRT etc. l Try to achieve better cache usage l Use of MASS vector intrinsics l Use of compiler directives l Will be time consuming because of: the flat performance profile the rapid rate of scientific developments Scicom. P 5 Slide 20 ECMWF

Outstanding migration issues l All IO related issues – GPFS Work can start once

Outstanding migration issues l All IO related issues – GPFS Work can start once we have the switch installed and have logically partitioned the familiarization p 690 (May) l Scheduling – use of Loadleveller for operational suite Awaiting first cluster of Phase 1 (July) l Operational use of two clusters Awaiting complete Phase 1 (October) Scicom. P 5 Slide 21 ECMWF