Course Description Parallel Computer Architecture 12232021 courseeleg 652

  • Slides: 27
Download presentation
Course Description: Parallel Computer Architecture 12/23/2021 courseeleg 652 -04 FTopic 0 a. ppt 1

Course Description: Parallel Computer Architecture 12/23/2021 courseeleg 652 -04 FTopic 0 a. ppt 1

Reading List Slides: Topic 1 x Henn&Patt: Chapter 1 Culler. Singh 98: Chapter 1

Reading List Slides: Topic 1 x Henn&Patt: Chapter 1 Culler. Singh 98: Chapter 1 Other assigned readings from homework and classes 12/23/2021 courseeleg 652 -04 FTopic 0 a. ppt 2

Why Study Parallel Architecture? Role of a computer architect: To design and engineer the

Why Study Parallel Architecture? Role of a computer architect: To design and engineer the various levels of a computer system to maximize performance and programmability within limits of technology and cost. Parallelism: • Provides alternative to faster clock for performance • Applies at all levels of system design • Is a fascinating perspective from which to view architecture • Is increasingly central in information processing 12/23/2021 courseeleg 652 -04 FTopic 0 a. ppt 3

Inevitability of Parallel Computing Application demands Technology Trends Architecture Trends Economics 12/23/2021 courseeleg 652

Inevitability of Parallel Computing Application demands Technology Trends Architecture Trends Economics 12/23/2021 courseeleg 652 -04 FTopic 0 a. ppt 4

Application Trends Demand for cycles fuels advances in hardware, and vice-versa Range of performance

Application Trends Demand for cycles fuels advances in hardware, and vice-versa Range of performance demands Goal of applications in using parallel machines: Speedup Productivity requirement 12/23/2021 courseeleg 652 -04 FTopic 0 a. ppt 5

Summary of Application Trends Transition to parallel computing has occurred for scientific and engineering

Summary of Application Trends Transition to parallel computing has occurred for scientific and engineering computing In rapid progress in commercial computing Desktop also uses multithreaded programs, which are a lot like parallel programs Demand for improving throughput on sequential workloads Demand on productivity 12/23/2021 courseeleg 652 -04 FTopic 0 a. ppt 6

Technology: A Closer Look Basic advance is decreasing feature size ( ) n Clock

Technology: A Closer Look Basic advance is decreasing feature size ( ) n Clock rate improves roughly proportional to improvemen in n Number of transistors improves like (or faster) Performance > 100 x per decade; clock rate 10 x, rest transistor count How to use more transistors? n n n Parallelism in processing Locality in data access Both need resources, so tradeoff 12/23/2021 courseeleg 652 -04 FTopic 0 a. ppt Proc $ Interconnect 7

 • 30% per year Clock Frequency Growth Rate 12/23/2021 courseeleg 652 -04 FTopic

• 30% per year Clock Frequency Growth Rate 12/23/2021 courseeleg 652 -04 FTopic 0 a. ppt 8

Transistor Count Growth Rate • 1 billion transistors on chip in early 2000’s A.

Transistor Count Growth Rate • 1 billion transistors on chip in early 2000’s A. D. • Transistor count grows much faster than clock rate - 40% per year, order of magnitude more contribution in 2 decades 12/23/2021 courseeleg 652 -04 FTopic 0 a. ppt 9

Similar Story for Storage Divergence between memory capacity and speed more pronounced Larger memories

Similar Story for Storage Divergence between memory capacity and speed more pronounced Larger memories are slower n Need deeper cache hierarchies Parallelism and locality within memory systems Disks too: Parallel disks plus caching 12/23/2021 courseeleg 652 -04 FTopic 0 a. ppt 10

Moore’s Law and Headcount Along with the number of transistors, the effort and headcount

Moore’s Law and Headcount Along with the number of transistors, the effort and headcount required to design a microprocessor has grown exponentially 12/23/2021 courseeleg 652 -04 FTopic 0 a. ppt 11

Architectural Trends Architecture: performance and capability Tradeoff between parallelism and locality n Current microprocessor:

Architectural Trends Architecture: performance and capability Tradeoff between parallelism and locality n Current microprocessor: 1/3 compute, 1/3 cache, 1/3 off-chip connect Understanding microprocessor architectural trends Four generations of architectural history: tube, transistor, IC, VLSI 12/23/2021 courseeleg 652 -04 FTopic 0 a. ppt 12

Technology Progress Overview Processor speed improvement: 2 x per year (since 85). 100 x

Technology Progress Overview Processor speed improvement: 2 x per year (since 85). 100 x in last decade. DRAM Memory Capacity: 2 x in 2 years (since 96). 64 x in last decade. DISK capacity: 2 x per year (since 97). 250 x in last decade. 12/23/2021 courseeleg 652 -04 FTopic 0 a. ppt 13

Motorola’s Power. PC 604 12/23/2021 courseeleg 652 -04 FTopic 0 a. ppt Pentium 14

Motorola’s Power. PC 604 12/23/2021 courseeleg 652 -04 FTopic 0 a. ppt Pentium 14

12/23/2021 courseeleg 652 -04 FTopic 0 a. ppt 15

12/23/2021 courseeleg 652 -04 FTopic 0 a. ppt 15

Technology Progress Overview Processor speed improvement: 2 x per year (since 85). 100 x

Technology Progress Overview Processor speed improvement: 2 x per year (since 85). 100 x in last decade. DRAM Memory Capacity: 2 x in 2 years (since 96). 64 x in last decade. DISK capacity: 2 x per year (since 97). 250 x in last decade. 12/23/2021 courseeleg 652 -04 FTopic 0 a. ppt 16

Summary: Parallel Architecture? Increasingly attractive n Economics, technology, architecture, application Parallelism exploited at many

Summary: Parallel Architecture? Increasingly attractive n Economics, technology, architecture, application Parallelism exploited at many levels Same story from memory system perspective Wide range of parallel architectures make sense 12/23/2021 courseeleg 652 -04 FTopic 0 a. ppt 17

12/23/2021 courseeleg 652 -04 FTopic 0 a. ppt 18

12/23/2021 courseeleg 652 -04 FTopic 0 a. ppt 18

12/23/2021 courseeleg 652 -04 FTopic 0 a. ppt 19

12/23/2021 courseeleg 652 -04 FTopic 0 a. ppt 19

12/23/2021 courseeleg 652 -04 FTopic 0 a. ppt 20

12/23/2021 courseeleg 652 -04 FTopic 0 a. ppt 20

12/23/2021 courseeleg 652 -04 FTopic 0 a. ppt 21

12/23/2021 courseeleg 652 -04 FTopic 0 a. ppt 21

12/23/2021 courseeleg 652 -04 FTopic 0 a. ppt 22

12/23/2021 courseeleg 652 -04 FTopic 0 a. ppt 22

The Earth Simulator Machine in Japan Earth Simulator (2002) n Max 40 TFLOPS n

The Earth Simulator Machine in Japan Earth Simulator (2002) n Max 40 TFLOPS n No. 1 in TOP 500 list n General purpose n Parallel vector processors n 400 M$(development) 12/23/2021 courseeleg 652 -04 FTopic 0 a. ppt 23

12/23/2021 courseeleg 652 -04 FTopic 0 a. ppt 24

12/23/2021 courseeleg 652 -04 FTopic 0 a. ppt 24

HPC Architecture Vector Processor ⇒ 1976~ (CRAY-1) Parallel Processors ⇒ 1985~ MPU Cluster、Grid ⇒

HPC Architecture Vector Processor ⇒ 1976~ (CRAY-1) Parallel Processors ⇒ 1985~ MPU Cluster、Grid ⇒ 1997~ (CM-1) (ASCI-RED) massively PP ⇒ 2008~ 2010 (DARPA-HPCS machines GRAPE-DR Blue. Gene/L BG/C 64 ) 12/23/2021 courseeleg 652 -04 FTopic 0 a. ppt 25

Cluster computer of commodity MPU ⇒ 1997~ ASCI Project n n n ASCI-Q 20

Cluster computer of commodity MPU ⇒ 1997~ ASCI Project n n n ASCI-Q 20 TFLOPS(2003) ASCI-Purple 100 TFLOPS(2005) OLNL project (2004) 8, 192 CPUs、 12, 544 CPUs Limitation of current cluster n n Low utilization of CPU due to high-latency in interconnection No automatic parallelization Limitation by size and power n ASCI-Purple (12, 544 CPUs) w 3 MW ASCI-Q 20 TFLOPS 12/23/2021 courseeleg 652 -04 FTopic 0 a. ppt 26

New generation parallel systems 2008~ ⇒ IBM Blue. Gene/L Project (360 TFLOPS、2005) High density

New generation parallel systems 2008~ ⇒ IBM Blue. Gene/L Project (360 TFLOPS、2005) High density parallel processor (65, 536 CPU chips in 64 racks、 131, 072 processors) IBM Blue. Gene/C 64 Project (1. 1 PFlops, 2007 ? ) n HPCS Project n IBM PERCS n Cray Cascade n SUN Hero project IBM Blue Gene/L w 12/23/2021 courseeleg 652 -04 FTopic 0 a. ppt 27