Chapter 10 Advanced Computer Architecture 10 1 Computer


































- Slides: 34

Chapter 10 - Advanced Computer Architecture 10 -1 Computer Architecture and Organization Miles Murdocca and Vincent Heuring Chapter 10 – Advanced Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

Chapter 10 - Advanced Computer Architecture 10 -2 Chapter Contents 10. 1 Parallel Architecture 10. 2 Superscalar Machines and the Power. PC 10. 3 VLIW Machines, and the Itanium 10. 4 Case Study: Extensions to the Instruction Set – The Intel MMX/SSEX and Motorola Altivec SIMD Instructions 10. 5 Programmable Logic Devices and Custom ICs 10. 6 Unconventional Architectures Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

Chapter 10 - Advanced Computer Architecture 10 -3 Parallel Speedup and Amdahl’s Law • In the context of parallel processing, speedup can be computed: • Amdahl’s law, for p processors and a fraction f of unparallelizable code: • For example, if f = 10% of the operations must be performed sequentially, then speedup can be no greater than 10 regardless of how many processors are used: Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

Chapter 10 - Advanced Computer Architecture 10 -4 Efficiency and Throughput • Efficiency is the ratio of speedup to the number of processors used. For a speedup of 5. 3 with 10 processors, the efficiency is: • Throughput is a measure of how much computation is achieved over time, and is of special concern for I/O bound and pipelined applications. For the case of a four stage pipeline that remains filled, in which each pipeline stage completes its task in 10 ns, the average time to complete an operation is 10 ns even though it takes 40 ns to execute any one operation. The overall throughput for this situation is then: Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

Chapter 10 - Advanced Computer Architecture 10 -5 Flynn Taxonomy • Classification of architectures according to the Flynn taxonomy: (a) SISD; (b) SIMD; (c) MIMD; (d) MISD. Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10 -6 Chapter 10 - Advanced Computer Architecture Network Topologies • Network topologies: (a) crossbar; (b) bus; (c) ring; (d) mesh; (e) star; (f) tree; (g) perfect shuffle; (h) hypercube. Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

Chapter 10 - Advanced Computer Architecture 10 -7 Crossbar • Internal organization of a crossbar. Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

Chapter 10 - Advanced Computer Architecture 10 -8 Crosspoint Settings • (a) Crosspoint settings for connections 0 ® 3 and 3 ® 0; (b) adjusted settings to accommodate connection 1 ® 1. Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

Chapter 10 - Advanced Computer Architecture 10 -9 Three-Stage Clos Network Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10 -10 Chapter 10 - Advanced Computer Architecture 12 -Channel Three. Stage Clos Network with n = p =6 Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10 -11 Chapter 10 - Advanced Computer Architecture 12 Channel Three. Stage Clos Network with n = p =2 Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10 -12 Chapter 10 - Advanced Computer Architecture 12 -Channel Three-Stage Clos Network with n = p = 4 Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

Chapter 10 - Advanced Computer Architecture 10 -13 12 -Channel Three-Stage Clos Network with n = p = 3 Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

Chapter 10 - Advanced Computer Architecture 10 -14 C function computes (x 2 + y 2) ´ y 2 Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10 -15 Chapter 10 - Advanced Computer Architecture Dependency Graph • (a) Control sequence for C program; (b) dependency graph for C program. Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10 -16 Chapter 10 - Advanced Computer Architecture Matrix Multiplication • (a) Problem setup for Ax = b; (b) equations for computing the bi. Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10 -17 Chapter 10 - Advanced Computer Architecture Matrix Multiplication Dependency Graph Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10 -18 Chapter 10 - Advanced Computer Architecture The Power. PC 601 Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

Chapter 10 - Advanced Computer Architecture 10 -19 128 -Bit IA-64 Instruction Word • Each 41 bit instruction consists of three register addresses (each 7 bits = 128 possible registers), a predicate register (6 bits) and the opcode and flags or general purpose register (14 bits, varies by instruction). Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

Chapter 10 - Advanced Computer Architecture 10 -20 Itanium Instruction Types Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10 -21 Chapter 10 - Advanced Computer Architecture Allowable Combinations of IA-64 Instruction Types Assigned to Instruction Slots Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

Chapter 10 - Advanced Computer Architecture 10 -22 IA-64 Instruction Issues • Maximum number of IA-64 instructions that can be executed for each pairing of bundles. Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10 -23 Chapter 10 - Advanced Computer Architecture Intel MMX (Multi. Media e. Xtensions) • Vector addition of eight bytes by the Intel PADDB mm 0, mm 1 instruction: Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

Chapter 10 - Advanced Computer Architecture 10 -24 • Intel “aliases” the floating point registers as MMX registers. This means that the Pentium’s 8 64 -bit floating-point registers do double-duty as MMX registers. Intel and Motorola Vector Registers • Motorola implements 32 128 -bit vector registers as a new set, separate and distinct from the floating-point registers. Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

Chapter 10 - Advanced Computer Architecture 10 -25 MMX and Alti. Vec Arithmetic Instructions Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10 -26 Chapter 10 - Advanced Computer Architecture Comparing Two MMX Byte Vectors for Equality Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10 -27 Chapter 10 - Advanced Computer Architecture Conditional Assignment of an MMX Byte Vector Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

Chapter 10 - Advanced Computer Architecture 10 -28 A PAL Device PLAs and PALs are similar except that the OR gates in a PAL have a fixed number of inputs and the inputs are not programmable. PALs are more prevalent than PLAs because they are easier to manufacture and are less complex. Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

10 -29 Chapter 10 - Advanced Computer Architecture Complex Programmable Logic Device CPLDs are PAL-like or PLA-like blocks that can be combined with programmable interconnections. Commercial CPLDs may contain as many as 200, 000 equivalent gates and have over 3, 000 macrocells. Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

Chapter 10 - Advanced Computer Architecture 10 -30 Field Programmable Gate Array Unlike CPLDs, which employ large logic blocks and fewer interconnection options, FPGAs employ small logic blocks that can be programmably interconnected. Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

Chapter 10 - Advanced Computer Architecture 10 -31 Quantum Computing Single-particle interference experiment. Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

Chapter 10 - Advanced Computer Architecture 10 -32 Multi-Valued Logic Truth tables for binary and ternary comparison functions: Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

Chapter 10 - Advanced Computer Architecture 10 -33 Neural Networks Model of a living neuron, and model of an artificial neuron (below). Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring

Chapter 10 - Advanced Computer Architecture 10 -34 Artificial Neural Network Example Two simple, feed-forward neural networks with inputs, weights, and thresholds as shown. Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring