Metrics q FLOPS FLoating point Operations Per Sec

  • Slides: 9
Download presentation
Metrics q. FLOPS (FLoating point Operations Per Sec) - a measure of the numerical

Metrics q. FLOPS (FLoating point Operations Per Sec) - a measure of the numerical processing of a CPU which can be an indicator of it’s scientific computing capability. q The floating-point format is a variation of scientific notation - the real number is represented using a mantissa, base, and exponent q Storing real number in computers: q use the fixed length of word as the storage space for a real number (e. g. 64 bits) q Mantissa is normalised (1. 61 is normalised, 16. 1 is not) q The mantissa and exponents are converted to base-2 q Some parts of the word are used to store the mantissa, 1 bit to store sign, and the rest to store the exponent q Advantages and disadvantages C Using a fixed-length space to store a wide overall range of values q q q If 64 bits are used to store the real numbers, in which 11 bits are used to store exponent and 52 bits to mantissa (the remaining 1 bit used to store sign). We can derive the range of numbers this storage layout can represent More bits are used to store mantissa, higher precision, but smaller range More bits are used to store exponent, wider range, but lower precision D The difference between two successive numbers is not uniform D When the numbers cannot be perfected converted to base-2 numbers, they must be rounded to be stored in the format, leading to some problems where algebraic rules do not appear to apply q The LINPACK benchmark produces a FLOPS results. This solves a dense system of linear equations by Gaussian elimination. Computer Science, University of Warwick 1

Example of Floating Point Numbers 172. 625 base 10 10101100. 101 X 2^0 base

Example of Floating Point Numbers 172. 625 base 10 10101100. 101 X 2^0 base 2 1. 0101100101 X 2^7 base 2 normalised Using 32 bit (4 bytes) to store the number in computers, in which 1 bit for sign, 8 bits for exponent, and the rest for Mantissa 0 00000111 00000010101100101 S Exp Mantissa Computer Science, University of Warwick 2

Metrics q MIPS (Millions of Instructions Per Second) - a measure of the speed

Metrics q MIPS (Millions of Instructions Per Second) - a measure of the speed of a processor. • Peak MIPS rates (usually vendor supplied) can be misrepresentative • Meaningless Information on Performance for Salespeople • People seldom refer to it Computer Science, University of Warwick 3

Metrics q SPECint - measures a processor’s integer processing capabilities. • Latest version SPECint

Metrics q SPECint - measures a processor’s integer processing capabilities. • Latest version SPECint 2006 • Can test cpu, memory, compiler, but cannot test networking, I/O • Consists of a series of benchmarks (12, including compression, compilation) • each benchmark has a reference time • Dividing the measured runtime of the benchmark by the reference time and multiplying by 100 provides a base ratio For example, if we run the benchmark 401. bzip 2 to test the system, whose reference time is 1400. The actual runtime of the benchmark is 140 sec. then the base ratio is calculated as 1400/140*100=1000 • These are averaged to produce a final performance figure for the processor. Computer Science, University of Warwick 4

SPECint 2006 benchmark suite Language Category Benchmark 400. perlbench C Programming Language 401. bzip

SPECint 2006 benchmark suite Language Category Benchmark 400. perlbench C Programming Language 401. bzip 2 C Compression 403. gcc C C Compiler 429. mcf C Combinatorial Optimization 445. gobmk C Artificial Intelligence 456. hmmer C Search Gene Sequence 458. sjeng C Artificial Intelligence 462. libquantum C Physics / Quantum Computing 464. h 264 ref C Video Compression 471. omnetpp C++ Discrete Event Simulation 473. astar C++ Path-finding Algorithms 483. xalancbmk C++ XML Processing Computer Science, University of Warwick 5

Metrics Communication: q Bandwidth (bytes/sec) • How much data can be sent per second

Metrics Communication: q Bandwidth (bytes/sec) • How much data can be sent per second over the network q Latency (seconds) • The time between one processor sending a message and the other processor receiving the message q Interconnection type: On-board interconnection or over networks. q Topologies: bus, crossbar, hub, switch q Protocols: stacks q unicast, multicast, broadcast. Storage capabilities: q Storage facilities: register, cache, memory, hard disk q Bandwidth and Latency. • Bandwidth: how much data can be accessed per second in a certain storage facility • Latency: the time between sending a data accessing request and receiving the requested data q Memory hierarchies (cpu register-> cache -> main memory -> remote memory) q Local, remote file systems Computer Science, University of Warwick 6

Top 500 Supercomputer list àWebsite: www. top 500. org àTop 500 project Started in

Top 500 Supercomputer list àWebsite: www. top 500. org àTop 500 project Started in 1993, updated twice a year àAiming to track the trend in HPC àUsing LINPACK to measure the performance (FLOPS) q Essentially, LINPACK is to solve the dense system of linear equations Ax=b (commonly encountered in engineering area) q Users are allowed to change the problem size to get the maximum performance, which is used to rank the supercomputers q Theoretical peak performance is also given for reference Computer Science, University of Warwick 7

Top 500 Supercomputer list q. Tends to represent parallel computers, so distributed systems such

Top 500 Supercomputer list q. Tends to represent parallel computers, so distributed systems such as SETI@Home are neglected. q. Does not consider storage or I/O issues q. Both custom designed machines and commodity machines win positions in the list q. General trend towards commodity machines (COTS - Commodity Off-The-Shelf). Blue. Gene/L, however, is not a COTS machine q. Connecting a large number of machines with relatively lower performance is more rewarding than connecting a small number of machines each with high performance q Read the paper: “A note on the Zipf distribution of Top 500 supercomputers” (download from my homepage) q. Performance q doubles each year, better than Moore’s Law : performance doubles approximately every 18 months q. Dominated by the United States (location map of the Top 100 machines: http: //www. top 500. org/lists/2006/11/top 100 map) q. UK supercomputers in the list q Cambridge: No. 20 (http: //www. top 500. org/system/8267 ), q AWE: No. 15 Computer Science, University of Warwick 8

Top Machine Blue. Gene/L q first supercomputer in the Blue Gene project q Specialised

Top Machine Blue. Gene/L q first supercomputer in the Blue Gene project q Specialised systems based on the Power architecture. • Individual power 400 processors at 700 Mhz • Two processors reside in a single chip. • Two chips reside on a “compute card” with 512 MB memory. • 16 of these compute cards are placed on a node board. • 32 node boards fit into one cabinet, and there are 64 cabinets. • 130, 712 CPUs with theoretical peak of 183. 5 TFLOPS/s • Multiple network topologies available, which can be selected depending on the application. q High density of processors in a small area: • Low power and (comparatively) slow processors - just lots of them! • Fast interconnects and low-latency. Computer Science, University of Warwick 9