Essential Overview Louisiana Tech University Ruston Louisiana Charles

  • Slides: 42
Download presentation
Essential Overview Louisiana Tech University Ruston, Louisiana Charles Grassl IBM January, 2006 © 2005

Essential Overview Louisiana Tech University Ruston, Louisiana Charles Grassl IBM January, 2006 © 2005 IBM

Agenda • Hardware • Software • Documentation 2 © 2005 IBM Corporation

Agenda • Hardware • Software • Documentation 2 © 2005 IBM Corporation

Hardware Overview • Processors: • Nodes: • Clusters: 3 © 2005 IBM Corporation

Hardware Overview • Processors: • Nodes: • Clusters: 3 © 2005 IBM Corporation

Product Naming New Name Old Name i. Series p. Series x. Series z. Series

Product Naming New Name Old Name i. Series p. Series x. Series z. Series 4 © 2005 IBM Corporation AS 400 RS 600 SP SP 2 IA-32 IA-64 ES 9000 Market Processor Commercial RS 64 Technical Server Mainframe POWER 3 POWER 4 POWER 5 Xeon AMD RS 64

Processor Progression Processor Years Clock Rate Feature POWER 2 1990 - 1994 20 –

Processor Progression Processor Years Clock Rate Feature POWER 2 1990 - 1994 20 – 60 MHz RISC P 2 SC 1994 - 1998 60 – 150 MHz Bandwidth POWER 3 1998 – 2002 200 – 450 MHz Single Chip POWER 4 2001 – 2005 1 – 1. 9 GHz Dual Core 1. 5 – 1. 9 GHz Multi-Thread POWER 5 5 © 2005 IBM Corporation 2004 -

POWER 5 Systems • POWER 5 processors • Single and Dual processor chips •

POWER 5 Systems • POWER 5 processors • Single and Dual processor chips • Modules • Dual Chip Modules (DCM) • Multi Chip Modules (MCM) • Nodes • Multiple modules • p 5 -575 • p 5 -595 • Cluster • Multiple nodes • Connected with High Speed Switch (HPS) 6 © 2005 IBM Corporation

Systems (“Nodes”) Model 7 Clock Rate Memory Processors (GHz) (x 2^30 byte) p 5

Systems (“Nodes”) Model 7 Clock Rate Memory Processors (GHz) (x 2^30 byte) p 5 -595 16 -64 1. 65, 1. 9 2000 p 5 -590 8 -32 1. 65, 1. 9 1000 p 5 -575 8 1, 5, 1. 9 256 p 5 -570 2 -16 1. 65, 1. 9 512 p 5 -550 2 -4 1. 65 64 p 5 -520 2 1. 65 32 p 5 -510 1, 2 1. 65 1 - 32 © 2005 IBM Corporation

POWER 5 Processor Systems MCM p 5 -595 Processor Chip DCM 8 © 2005

POWER 5 Processor Systems MCM p 5 -595 Processor Chip DCM 8 © 2005 IBM Corporation p 5 -575 Cluster

Cluster 1600 Network, Disk System Multi Processor Nodes Logical View 9 © 2005 IBM

Cluster 1600 Network, Disk System Multi Processor Nodes Logical View 9 © 2005 IBM Corporation Physical View

Local System Name • IBM p 5 -575 nodes • 1. 9 GHz POWER

Local System Name • IBM p 5 -575 nodes • 1. 9 GHz POWER 5 processors • Single processor chips • “ 575” distinction: • • • 8 processors per node • HPS interconnect Dual Chip Module (DCM) 8 DCMs One or two processors per chip • “ 595” distinction: • Multi Chip Module (MCM) construction • 10 © 2005 IBM Corporation Single Core (SC) Dual Core (DC) 8 MCMs

POWER 5 Processors • Multi-processor chip • High clock rate: Multiple GHz • Three

POWER 5 Processors • Multi-processor chip • High clock rate: Multiple GHz • Three cache levels • Bandwidth • Latency hiding • Shared Memory • Large memory size 11 © 2005 IBM Corporation

POWER 5 Features • • • 12 Private L 1 cache Shared L 2

POWER 5 Features • • • 12 Private L 1 cache Shared L 2 cache Shared L 3 cache Interleaved memory Hardware Prefetch Multiple Page Size support © 2005 IBM Corporation

Processor Characteristics • High frequency clocks • Deep pipelines • High asymptotic rates •

Processor Characteristics • High frequency clocks • Deep pipelines • High asymptotic rates • • • 13 Superscalar Speculative out-of-order instructions Up to 8 outstanding cache line misses Large number of instructions in flight Branch prediction Hardware Prefetching © 2005 IBM Corporation

Processor Features Clock Caches L 3 Speed POWER 5 1. 5 – 1. 9

Processor Features Clock Caches L 3 Speed POWER 5 1. 5 – 1. 9 - … GHz Three levels 1/3 clock frequency ½ clock frequency Virtualization Up to 32 partitions Partitions Unit processor Up to 254 partitions Power Mang. Static Dynamic Thread Execution Memory Store Renaming Registers 14 POWER 4 1. 0 – 1. 9 GHz © 2005 IBM Corporation Fractional Single Thread Multi Threading Single Buffer Double Buffer GP: 72 FP: 80 GP: 120 FP: 120

Caches and Memory POWER 4 POWER 5 Data: 32 kbyte L 1 Cache Instruction:

Caches and Memory POWER 4 POWER 5 Data: 32 kbyte L 1 Cache Instruction: 64 kbyte 2 -way Assoc. , FIFO 4 -way Assoc. , LRU L 2 Cache 1. 5 Mbyte 8 -way Assoc. , FIFO 1. 9 Mbyte 10 -way Assoc. , LRU L 3 Cache 32 Mbyte 8 -way Assoc. , LRU 120 Cycles 36 Mbyte 12 -way Assoc. , LRU ~80 Cycles 4 Gbyte/s / Chip 16 Gbyte/s / Chip Memory Bandwidth 15 © 2005 IBM Corporation

POWER 4 – POWER 5 Comparison POWER 4+ 16 POWER 5 Frequency (GHz) 1.

POWER 4 – POWER 5 Comparison POWER 4+ 16 POWER 5 Frequency (GHz) 1. 7 1. 9 L 2 Latency (Cycles) 12 12 L 3 Latency (Cycles) 120 80 Memory Latency (Cycles) 351 220 Copy Bandwidth 4 proc. (Gbyte/s) 8 18 Linpack Rate N=1000 (Gflop/s) 3. 9 5. 6 SPECint_base 2000 1077 1398 SPECfp_base 2000 1598 2576 © 2005 IBM Corporation

POWER 5 Design: Summary • More gates • 170 million 260 million • Enhancements

POWER 5 Design: Summary • More gates • 170 million 260 million • Enhancements • Increased cache associativity • Increased number of rename registers • Reduced L 3 and cache latency • New features • Simultaneous Multi Threading • Dynamic power management 17 © 2005 IBM Corporation

Processor Systems (Nodes) • Multiple processors • Multiple modules • Various construction formats •

Processor Systems (Nodes) • Multiple processors • Multiple modules • Various construction formats • Multi Chip Modules • Dual Chip Modules • Shared memory 18 © 2005 IBM Corporation

Multi Chip and Dual Chip Modules Dual Chip Module (MCM) p 5 -570 p

Multi Chip and Dual Chip Modules Dual Chip Module (MCM) p 5 -570 p 5 -575 Multi Chip Module (MCM) p 5 -590 p 5 -595 POWER 5 Processor Chip 19 © 2005 IBM Corporation

Dual Chip Module • Each Module: • 1 processor chip • 1 L 3

Dual Chip Module • Each Module: • 1 processor chip • 1 L 3 cache • 1 Memory card • Each Processor Chip • 2 processors • L 1 caches • Registers • Functional units • 1 L 2 cache • 1 path to memory 20 © 2005 IBM Corporation 36 Mbyte L 3 Memory

Multi Chip Module • Each Module: • 4 processor chips Memory • 4 L

Multi Chip Module • Each Module: • 4 processor chips Memory • 4 L 3 cache chips • 2 Memory cards • Each Processor Chip • 2 processors • L 1 caches • Registers • Functional units • 1 L 2 cache • 1 path to memory 21 © 2005 IBM Corporation

POWER 5 Multi Chip Module • • • 22 Four POWER 5 chips Four

POWER 5 Multi Chip Module • • • 22 Four POWER 5 chips Four L 3 cache chips 95 mm 4, 491 signal I/Os 89 layers of metal © 2005 IBM Corporation

POWER 5 Dual Chip Module • One POWER 5 chip • Single or Dual

POWER 5 Dual Chip Module • One POWER 5 chip • Single or Dual Core • One L 3 cache chips 23 © 2005 IBM Corporation

Modifications to POWER 4 System Structure P L 3 24 © 2005 IBM Corporation

Modifications to POWER 4 System Structure P L 3 24 © 2005 IBM Corporation P P P L 2 L 3 Fab Ctl Mem Ctl L 3 Mem Ctl Memory L 3

Switch Technology • Internal network • In lieu of Gig. Ethernet, Myrinet, Quadrics, etc.

Switch Technology • Internal network • In lieu of Gig. Ethernet, Myrinet, Quadrics, etc. • Fourth generation • HPS Switch (POWER 2 generation) • SP Switch (POWER 2 -> POWER 3) • SP Switch 2 (POWER 3 -> POWER 4) • HPS (POWER 4 -> POWER 5) • Multiple links per node • Match number of links to number of processors 25 © 2005 IBM Corporation

High Performance Switch (HPS) • Also Known As “Federation” • Follow on to SP

High Performance Switch (HPS) • Also Known As “Federation” • Follow on to SP Switch 2 • Also known as “Colony” • Specifications: • 2 Gbyte/s (bidirectional) • 5 microsecond latency • Configuration: • Up to four adaptors per node • 2 links per adaptor • 16 Gbyte/s per node 26 © 2005 IBM Corporation

HPS Specifications 27 Latency [microsec. ] Bandwidth, single [Mbyte/s] Bandwidth, multiple [Mbyte/s] SP Switch

HPS Specifications 27 Latency [microsec. ] Bandwidth, single [Mbyte/s] Bandwidth, multiple [Mbyte/s] SP Switch 2 15 350 550 HPS 5 1800 1930 © 2005 IBM Corporation

Software Overview • Operating System • AIX • Compilers • C++ • Fortran •

Software Overview • Operating System • AIX • Compilers • C++ • Fortran • Batch Queue • Load. Leveler (IBM) • LSF (Platform) • PBS • Gridware 28 © 2005 IBM Corporation

AIX • Current Version: AIX 5. 3 • Processors: • POWER 3 • POWER

AIX • Current Version: AIX 5. 3 • Processors: • POWER 3 • POWER 4 • POWER 5 • Linux Affinity • Logical PARtitions (LPAR) Nodes • Operating system • Memory • Network connections • Kernel Address Size: • 64 -bit • 32 -bit 29 © 2005 IBM Corporation

Linux on POWER • • • Native Linux, Su. SE 7 Su. SE 8

Linux on POWER • • • Native Linux, Su. SE 7 Su. SE 8 Rpm's and package managers Cluster Systems Manager 64 -bit kernel 32/64 -bit applications support (Su. SE 8) Compiler C C++ Fortran 30 © 2005 IBM Corporation User Name Xlc xl. C xlf

Compilers C and C++ • Visual Age C and C++ Professional for AIX •

Compilers C and C++ • Visual Age C and C++ Professional for AIX • Versions 6, 7, 8 • ANSI C • C++ • Compiler names: • xlc • xl. C 31 © 2005 IBM Corporation Fortran • XL Fortran for AIX • Versions 8, 9, 10 • • Fortran 77 Fortran 90 • Compiler names: • xlf 77 • xlf 90

Compiler Names Compiler User Name Fortran 77 xlf 77 Fortran 90 xlf 90 C

Compiler Names Compiler User Name Fortran 77 xlf 77 Fortran 90 xlf 90 C xlc C++ xl. C MPI compile Reentrant mpxlf, mpcc xlf_r, xlc_r AIX uses different compiler names to perform some tasks which are handled by compiler flags on most other systems 32 © 2005 IBM Corporation

Compiler Usage Language ANSI C Extended C MPI, C C++ Fortran 77 Fortran 90

Compiler Usage Language ANSI C Extended C MPI, C C++ Fortran 77 Fortran 90 MPI fortran 33 © 2005 IBM Corporation Command Feature Extension xlc_r ANSI Thread safe . c cc Pre-ANSI . c mpxlc MPI . c xl. C_r xlf_r xlf 90_r mpxlf Thread safe MPI . C. cc. cpp. f. f. f

User Limits • Set by the system administrator • Ulimit: • C or K

User Limits • Set by the system administrator • Ulimit: • C or K shell built-in • Sets or reports resource limits • Limits are defined in /etc/security/limits • Sizes are in 512 byte blocks • Times are in seconds • $ ulimit -a 34 © 2005 IBM Corporation

Ulimit Defaults Value Limit Definition Typical fsize File Size 2097151 Unlimited (-1) core Core

Ulimit Defaults Value Limit Definition Typical fsize File Size 2097151 Unlimited (-1) core Core File Size 2097151 Unlimited (-1) cpu Per Process limit data Data Segment Size 262144 Unlimited (-1) stack Segment Size 65536 *Unlimited (-1) No. files File Descriptor Limit 2000 * 64 -bit address mode 35 Default © 2005 IBM Corporation -1 (unlimited) Unlimited (-1)

Other Defaults • Thread control • /etc/environment • • • 36 © 2005 IBM

Other Defaults • Thread control • /etc/environment • • • 36 © 2005 IBM Corporation AIXTHREAD_SCOPE=S AIXTHREAD_MNRATIO=1: 1 AIXTHREAD_COND_DEBUG=OFF AIXTHREAD_GUARDPAGES=4 AIXTHREAD_MUTEX_DEBUG=OFF AIXTHREAD_RWLOCK_DEBUG=OFF

Batch Queuing • Compile on any AIX node • Use –qarch=pwr 5 • Submit

Batch Queuing • Compile on any AIX node • Use –qarch=pwr 5 • Submit job with available batch utility • Use appropriate queue name • Available queuing systems: • Load. Leveler • PBS • Gridware • LSF 37 © 2005 IBM Corporation

Cluster Layout Node 0 Compile And Submit Node 38 © 2005 IBM Corporation Node

Cluster Layout Node 0 Compile And Submit Node 38 © 2005 IBM Corporation Node 1 Network Node 2

Documentation • Software: • www. software. ibm. com • Products A-Z • X ->

Documentation • Software: • www. software. ibm. com • Products A-Z • X -> xl C, xl C/C++, xl Fortran • www. servers. ibm. com/aix • Compilers • /usr/vac/doc • /usr/vacpp/doc • /usr/lpp/xlf/doc • Redbooks: • www. redbooks. ibm. com/ • IBM e. Server p 5 590 and 595 System Handbook 39 © 2005 IBM Corporation

Documentation • AIX Commands Reference • AIX command: • /usr/sbin/infocenter • /opt/ibm_help/help_start. sh •

Documentation • AIX Commands Reference • AIX command: • /usr/sbin/infocenter • /opt/ibm_help/help_start. sh • http: //www. unet. univie. ac. at/aixgen/wbinfnav/ai xcmdsrefbooks. htm • Google search: “AIX Commands Reference” 40 © 2005 IBM Corporation

Documentation Library Google Search: AIX 5 L documentation Library http: //publibn. boulder. ibm. com/cgi-bin/ds_rslt

Documentation Library Google Search: AIX 5 L documentation Library http: //publibn. boulder. ibm. com/cgi-bin/ds_rslt 41 © 2005 IBM Corporation

Summary: Architecture • System architecture • Processors • Nodes • Cluster • Processors •

Summary: Architecture • System architecture • Processors • Nodes • Cluster • Processors • POWER 5 • Three levels of cache • Nodes: • Eight processor p 5 -575 • Cluster: • 14 p 5 -575 nodes • HPS interconnect 42 © 2005 IBM Corporation