High Performance Clusters High Performance Computing Clusters and

  • Slides: 23
Download presentation
High Performance Clusters High Performance Computing, Clusters, and Productivity LNXI SOS 8 Presentation -

High Performance Clusters High Performance Computing, Clusters, and Productivity LNXI SOS 8 Presentation - April 04

HPC Market Reality mputing can represent a sustainable business provided effective leveraging of comm

HPC Market Reality mputing can represent a sustainable business provided effective leveraging of comm owth is not only due to their use in the scientific community, but their growing use by tures targeted solely for the HPC community are generally not widely adopted by the during and after installation - designed into the architecture and support models s, is critical throughout life of system for maintaining productivity LNXI SOS 8 Presentation - April 04 2

The technology was compelling… BBC (03/02/69) - The supersonic airliner, Concorde, has made a

The technology was compelling… BBC (03/02/69) - The supersonic airliner, Concorde, has made a "faultless" maiden flight. The Anglo-French plane took off from Toulouse and was in the air for just 27 minutes before the pilot made the decision to land. LNXI SOS 8 Presentation - April 04 3

…but eventually the bottom line won Associated Press (10/24/03) - It's both a technological

…but eventually the bottom line won Associated Press (10/24/03) - It's both a technological marvel and financial failure and today the bottom line wins as the Concorde makes its final flight. LNXI SOS 8 Presentation - April 04 4

On the other hand… LNXI SOS 8 Presentation - April 04 5

On the other hand… LNXI SOS 8 Presentation - April 04 5

Southwest is not blinking lights Bob LNXI SOS 8 Presentation - April 04 6

Southwest is not blinking lights Bob LNXI SOS 8 Presentation - April 04 6

Two Different Approaches… Concorde Southwest n Average Roundtrip Fares n n n $90. 03

Two Different Approaches… Concorde Southwest n Average Roundtrip Fares n n n $90. 03 n 45, 200, 000 72 million n $10, 700 Avg. Customers Served/Year n Average Miles/Year n Average Roundtrip Fares n Avg. Customers Served/Year n n 93, 000 Average Miles/Year n Concorde: 11. 1 million …Two Very Different Measurements of Productivity? LNXI SOS 8 Presentation - April 04 7

Common Metrics op 500. org (linpack) Performance does not equate to productivity CO Something

Common Metrics op 500. org (linpack) Performance does not equate to productivity CO Something may be inexpensive to own but may not be productive hat matters Price/Productivity: Effective productivity over time for the total price involved LNXI SOS 8 Presentation - April 04 8

Productivity n Productivity is the ratio between the amount of goods or services produced

Productivity n Productivity is the ratio between the amount of goods or services produced and the resource or expense that goes into producing them n Productivity implies the ratio of Price/Product n Product: A successful run of the code the system was purchased for n Price: The total cost of the system over its lifetime (including (software and hardware) acquisition, system cost, support, development, infrastructure, and labor) LNXI SOS 8 Presentation - April 04 9

Focus on Productivity Recent Panel at SC 2003: HPC Productivity n n The productivity

Focus on Productivity Recent Panel at SC 2003: HPC Productivity n n The productivity of a HPC system is measured by factors that may not be associated with hardware speed. These factors include program execution time, as well as software development time and other direct and indirect costs. n David Kuck, Manager of the Software and Solutions Group at Intel, "PR (getting on to the top 500 list) tends to make people ignore real productivity issues. “ DARPA/IPTO’s High Productivity Computing Systems n n Goal: n Provide a new generation of economically viable high productivity computing systems for the national security and industrial user community (2007 – 2010) LNXI SOS 8 Presentation - April 04 10

System Architecture and Optimization n It’s all about the application! The system maps to

System Architecture and Optimization n It’s all about the application! The system maps to the application No system is “one size fits all” n Analysis of application(s) requirements n n n n Memory Inter-process communications Bandwidth Latency Floating Point/Integer needs Existing/New codes Parallelism …… Identification of bottlenecks to generate optimal system design/selection/price System design trades need to optimized for best resulting price/productivity Cluster or SMP? Why not both! LNXI SOS 8 Presentation - April 04 11

System Architecture: Linux Clusters being used in large scale Production Systems Los Alamos -

System Architecture: Linux Clusters being used in large scale Production Systems Los Alamos - Lightning 11. 26 TFLOPS Linux Networx Evolocity 2, 816 Opteron Processors Lawrence Livermore - MCR 11. 2 TFLOPS Linux Networx E 2 2, 304 Intel Processors LNXI SOS 8 Presentation - April 04 Los Alamos - Pink 10 TFLOPS Linux Networx E 2 2, 048 Intel Processors 12

System Architecture: Linux Clusters Being Used in Multi-Application, Scientific Computing Production Environments As part

System Architecture: Linux Clusters Being Used in Multi-Application, Scientific Computing Production Environments As part of the Technology Insertion 2004 (TI-04) program, the Department of Defense High Performance Computing Modernization Program (HPCMP) selects Linux Networx for the Army Research Laboratory Major Shared Resource Center’s (MSRC) 2, 132 -processor Linux cluster. When the system is fully deployed in mid-2004, this Evolocity II cluster will be the HPCMP’s largest deployment of an Intel processor-based Linux cluster, and the solution will adopt Intel 64 -bit extension technology. LNXI SOS 8 Presentation - April 04 13

System Architecture: Leveraging the Commodity A company focused on scientific computing will need to

System Architecture: Leveraging the Commodity A company focused on scientific computing will need to leverage commodity technology to be viable n n n When using commodity technologies, Systems Engineering is absolutely critical n n n Market isn’t large enough to afford non-recurring engineering dedicated to advance niche technology Government funding is not what it used to be Need to take advantages of economies of scale Focus development and resources to fill critical gaps and/or provide the “glue” Understanding of all subsystems Disciplined engineering approach System integration and test Keeping up with technology Leverage both commodity hardware and software LNXI SOS 8 Presentation - April 04 14

Supporting Software and Tools n n n In-house and vendor expertise Stability and supportability

Supporting Software and Tools n n n In-house and vendor expertise Stability and supportability Integration and compatibility of tools n Versions and version compatibility Profilers and tools that provide insight enabling re-structuring of code for optimal performance n System management and administration tools Programmability n n Programming a cluster is probably now as easy, if not easier than programming an SMP or vector machine Programming tools and expertise yield large variations in Price/Productivity Compilers n n Variations in compiling, both in compiler selection but also in configuration and optimization, yield 5%-30% differences in performance Debugging n Having correct tools and proper insight minimizes time to production LNXI SOS 8 Presentation - April 04 15

Installation and Acceptance ions ting infrastructure producing nths to get working and producing output

Installation and Acceptance ions ting infrastructure producing nths to get working and producing output (4 months out of a 3 year life cycle = 11% reductio LNXI SOS 8 Presentation - April 04 16

Hardware and Software System Maintenance Hardware n n n Architecture should minimize downtime Rapid

Hardware and Software System Maintenance Hardware n n n Architecture should minimize downtime Rapid turnaround from vendor Upgrades Scalability Software n n n Operating System, Middleware, Drivers, Application level Open Source n n Linux offers no single “throat to choke” Upgrades n Double Edged Sword n n Performance Risks Downtime Revision synchronization Most people can get the system to work the first time… LNXI SOS 8 Presentation - April 04 17

Porting/Optimizing Applications In-house codes Third Party Commercial codes Performance Modeling n n n Examining

Porting/Optimizing Applications In-house codes Third Party Commercial codes Performance Modeling n n n Examining algorithm structure to enable re-architecture of code Can yield significant performance deltas – Observed up to %1000 Optimization n Re-structuring Compiling Hardware/Operations/Applications Ratio: $1 / $0. 75 / $1. 75 Most people spend $$$ on system that could have been better spent on optimizing their software. LNXI SOS 8 Presentation - April 04 18

Education Who n n Administrators n n n It is easier to teach Linux

Education Who n n Administrators n n n It is easier to teach Linux to a system administrator than to teach system administration to a “Linux person” Application Engineers End-Users What n n Operating System Compilers Tools When n n As soon as the decision is made to consider a Linux cluster Why n n Directly impacts productivity. LNXI SOS 8 Presentation - April 04 19

Criticality of Vendor Involvement On Producti r HPC than many other market areas ding

Criticality of Vendor Involvement On Producti r HPC than many other market areas ding edge ms and applications al for clusters endor, it’s your vendor’s vendors because clusters are assembl nce and relationship with all components critical age, Processors Compilers, Tools, Management, Filesystems LNXI SOS 8 Presentation - April 04 20

Criticality of Vendor Involvement On Producti ndor needs to be engaged over the lifetime

Criticality of Vendor Involvement On Producti ndor needs to be engaged over the lifetime of the system Pre-Installation n Working with customer and component suppliers to provide optimal architecture for target applica Facility impact and design Project planning and Integration Installation n n Minimize impact on ongoing operations Time to production Post-Installation n n Training and Education Availability of experience and skills Ability to pull together component vendors to resolve issues and provide service Vendor participation to ensure productivity (avoid dump and run) LNXI SOS 8 Presentation - April 04 21

Linux Networx Value Linux Networx provides cluster computing systems that deliver maximum sustained performance

Linux Networx Value Linux Networx provides cluster computing systems that deliver maximum sustained performance and high return on investment. We achieve high customer satisfaction by delivering Five Points of Proven Value: n n n Rigorous System Engineering, Q/A, and validation process with every cluster system Full pre-ship system build up and testing, followed by rapid on-site installation Delivering complete systems with optimized applications, the latest cluster technologies and open source tools Total Cluster Management from one interface Cluster Services, support and Linux cluster training LNXI SOS 8 Presentation - April 04 22

Conclusion ommercial communities is due to high productivity at a low pric llation -

Conclusion ommercial communities is due to high productivity at a low pric llation - designed into the architecture and support models or with strong systems engineering, firm component vendor rel LNXI SOS 8 Presentation - April 04 23