Informationsteknologi Todays class n Parallel Computer Architectures Saturday
- Slides: 56
Informationsteknologi Today’s class n Parallel Computer Architectures Saturday, November 17, 2007 Computer Architecture I - Class 14 1
Informationsteknologi Reasons for Parallelism Physical constraints, such as speed of light and quantum mechanical effects, are being reached n To increase speed of computation provide parallel computation rather than faster CPUs n Saturday, November 17, 2007 Computer Architecture I - Class 14 2
Informationsteknologi Levels for Parallelism n n n CPU – pipelining and superscalar architecture Very long instruction words with implicit parallelism CPUs with special features to handle multiple threads of control at once Multiple CPUs on the same chip Extra CPU boards with additional processing capacity Replicate entire CPUs – multiprocessors and multicomputers Saturday, November 17, 2007 Computer Architecture I - Class 14 3
Informationsteknologi Parallel Computer Architectures Saturday, November 17, 2007 Computer Architecture I - Class 14 4
On-Chip Parallelism
Informationsteknologi Instruction-Level Parallelism At the lowest level, achieve parallelism by issuing multiple instructions per clock cycle n Two ways to do this: n ® Superscalar processors ® VLIW (Very Large Instruction Word) processors Saturday, November 17, 2007 Computer Architecture I - Class 14 6
Informationsteknologi VLIW Processors Saturday, November 17, 2007 Computer Architecture I - Class 14 7
Informationsteknologi The Tri. Media VLIW CPU Designed by Phillips (inventor of CD) n Used as an embedded processor in CD, DVD, MP 3 players, digital cameras, etc. n Each instruction holds up to 5 operations n Saturday, November 17, 2007 Computer Architecture I - Class 14 8
Informationsteknologi The Tri. Media VLIW CPU Characteristics n n n Byte-oriented memory 32 -bit words 128 general-purpose 32 -bit registers ® n Four special purpose registers ® n PC, PSW, two registers for interrupt handling 64 -bit register that counts the number of CPU cycles since the CPU was last reset ® n R 0 is always 0, R 1 is always 1 Takes 2000 years to wrap around at 300 MHz 64 KB instruction cache, 16 KB data cache Saturday, November 17, 2007 Computer Architecture I - Class 14 9
Informationsteknologi The Tri. Media VLIW CPU Functional Units Saturday, November 17, 2007 Computer Architecture I - Class 14 10
Informationsteknologi On-Chip Multithreading When a memory reference misses the level 1 and level 2 caches there is a long wait until the requested word is loaded into the cache n This stalls the pipeline n On-chip multithreading deals with this by allowing the CPU to manage multiple threads of control at the same time n Saturday, November 17, 2007 Computer Architecture I - Class 14 11
Informationsteknologi Fine-Grained and Coarse. Grained Multithreading Saturday, November 17, 2007 Computer Architecture I - Class 14 12
Informationsteknologi Multithreading with a Dual. Issue Superscalar CPU Saturday, November 17, 2007 Computer Architecture I - Class 14 13
Informationsteknologi Single-Chip Multiprocessors Provide a larger performance gain than multithreading n Contain two or more CPUs n Saturday, November 17, 2007 Computer Architecture I - Class 14 14
Informationsteknologi Homogeneous Multiprocessors on a Chip Saturday, November 17, 2007 Computer Architecture I - Class 14 15
Informationsteknologi Heterogeneous Multiprocessors on a Chip n The logical structure of a simple DVD player contains a heterogeneous multiprocessor containing multiple cores for different functions. Saturday, November 17, 2007 Computer Architecture I - Class 14 16
Coprocessors
Informationsteknologi Coprocessors n Increase speed of computer by adding a second, specialized processor Saturday, November 17, 2007 Computer Architecture I - Class 14 18
Informationsteknologi Introduction to Networking Saturday, November 17, 2007 Computer Architecture I - Class 14 19
Informationsteknologi Network Processors n Programmable devices that can handle incoming and outgoing network packets at wire speed Saturday, November 17, 2007 Computer Architecture I - Class 14 20
Informationsteknologi Media Processors Handle high-resolution photographic images and audio & video streams n Ordinary CPUs are not good at the massive computations needed to process the large amounts of data in these applications n Saturday, November 17, 2007 Computer Architecture I - Class 14 21
Informationsteknologi The Nexperia Media Processor Saturday, November 17, 2007 Computer Architecture I - Class 14 22
Shared-Memory Multiprocessors
Informationsteknologi Multiprocessor n n n A parallel computer in which all the CPUs share a common memory Hard to build (because of the shared memory) Easy to program Saturday, November 17, 2007 Computer Architecture I - Class 14 24
Informationsteknologi Multicomputer n n Each CPU has its own private memory, accessible only to itself and not to any other CPU Also known as a distributed memory system Easy to build Hard to program Saturday, November 17, 2007 Computer Architecture I - Class 14 25
Informationsteknologi Taxonomy of Parallel Computers Saturday, November 17, 2007 Computer Architecture I - Class 14 26
Informationsteknologi Taxonomy of Parallel Computers Saturday, November 17, 2007 Computer Architecture I - Class 14 27
Sequential Memory Consistency Informationsteknologi n In the presence of multiple read and write requests, some interleaving of all the requests is chosen by the hardware (nondeterministically), but all CPUs see the same order Saturday, November 17, 2007 Computer Architecture I - Class 14 28
Informationsteknologi Processor Memory Consistency Writes by any CPU are seen by all CPUs in the order they were issued n For every memory word, all CPUs see all writes to it in the same order n Does not guarantee that every CPU sees the same ordering n Saturday, November 17, 2007 Computer Architecture I - Class 14 29
Informationsteknologi Weak Memory Consistency n Does not guarantee that writes from a single CPU are seen in order by other CPUs Saturday, November 17, 2007 Computer Architecture I - Class 14 30
Informationsteknologi UMA Symmetric Multiprocessor Architectures n Simplest multiprocessors are based on a single bus Saturday, November 17, 2007 Computer Architecture I - Class 14 31
Informationsteknologi Cache Coherence n n n Suppose CPU 1 and CPU 2 each have a copy of the same data in their respective caches Now suppose CPU 1 modifies the data in its cache and immediately thereafter CPU 2 reads its copy CPU 2 will get stale data This problem is known as the cache coherence problem Solutions to this problem have the cache controller eavesdropping on the bus Saturday, November 17, 2007 Computer Architecture I - Class 14 32
Informationsteknologi Snooping Caches n n The write through cache coherence protocol – note that all writes go to memory The empty boxes indicate that no action is taken Saturday, November 17, 2007 Computer Architecture I - Class 14 33
Informationsteknologi The MESI Cache Coherence Protocol Saturday, November 17, 2007 Computer Architecture I - Class 14 34
Informationsteknologi The MESI Cache Coherence Protocol Saturday, November 17, 2007 Computer Architecture I - Class 14 35
Informationsteknologi Crossbar Switches n n Use of a single bus limits the size of a UMA multiprocessor to 16 or 32 CPUs The simplest circuit for connecting n CPUs to k memories is the crossbar switch Saturday, November 17, 2007 Computer Architecture I - Class 14 36
Informationsteknologi Multistage Switching Networks n n n Larger UMA multiprocessors are based on the humble 2 x 2 switch shown here Messages arriving on either input line (A or B) can be switched to either output line (X or Y) Messages contain four parts: ® ® Module – which memory to use Address – specifies address within the module Opcode – specifies the operation, such as READ or WRITE Value – optional field, may contain an operand Saturday, November 17, 2007 Computer Architecture I - Class 14 37
Informationsteknologi Multistage Switching Networks Saturday, November 17, 2007 Computer Architecture I - Class 14 38
Informationsteknologi NUMA Multiprocessors n n n Non Uniform Memory Access All memory modules do not have the same access time Three key characteristics: A single address space visible to all CPUs ® Access to remote memory is done using LOAD and STORE instructions ® Access to remote memory is slower than access to local memory ® n n NC-NUMA (no cache present) CC-NUMA (coherent caches present) Saturday, November 17, 2007 Computer Architecture I - Class 14 39
Informationsteknologi NUMA Multiprocessors n A NUMA machine based on two levels of buses Saturday, November 17, 2007 Computer Architecture I - Class 14 40
Informationsteknologi Directory-Based Multiprocessor Most popular approach for building Cache Coherent NUMA multiprocessors n Maintains a database telling where each cache line is and what its status is n Database is queried on every instruction that references memory, so must be kept in extremely fast special purpose hardware n Saturday, November 17, 2007 Computer Architecture I - Class 14 41
Informationsteknologi Directory-Based Multiprocessor n n n Below is an example 256 -node multiprocessor Each node has one CPU and 16 MB RAM Total memory is 232 bytes, divided into 226 cache lines of 64 bytes each Saturday, November 17, 2007 Computer Architecture I - Class 14 42
Informationsteknologi Directory-Based Multiprocessor n n n Suppose CPU 20 issues a LOAD instruction for memory at physical address 0 x 24000108 This translates to node 36, line 4, offset 8 A request is made over the interconnection network; the directory for node 36 shows line 4 is not cached, so it is fetched from local RAM, sent back to node 20, and the directory entry for line 4 is updated to show it cached at node 20 Saturday, November 17, 2007 Computer Architecture I - Class 14 43
Informationsteknologi Directory-Based Multiprocessor n n n Now suppose we want to load memory referenced by node 36’s cache line 2 From the directory entry we see it’s at node 82 At this point the hardware updates directory entry 2 to show the line is now at node 20 and then send a message to node 82 telling it to pass the line to node 20 and invalidate its cache Saturday, November 17, 2007 Computer Architecture I - Class 14 44
Message-Passing Multicomputers
Multicomputers Informationsteknologi n n Each CPU has its own private memory, not directly accessible to any other CPU Programs interact with messages, since they cannot get at each other’s memory via LOAD and STORE Saturday, November 17, 2007 Computer Architecture I - Class 14 46
Informationsteknologi Topology (a) A star n (b) A complete interconnect n (c) A tree n (d) A ring n Saturday, November 17, 2007 Computer Architecture I - Class 14 47
Informationsteknologi Topology (e) A grid n (f) A double torus n (g) A cube n (h) A 4 D hypercube n Saturday, November 17, 2007 Computer Architecture I - Class 14 48
Informationsteknologi Massively Parallel Processors Use standard CPUs, such as the Intel Pentium, as their processors n Very high performance proprietary interconnection network n Enormous I/O capacity n Fault tolerant n ® Do not want a program that runs for many hours aborted because one CPU crashed Saturday, November 17, 2007 Computer Architecture I - Class 14 49
Informationsteknologi Blue. Gene/L Custom Processor Chip Saturday, November 17, 2007 Computer Architecture I - Class 14 50
Informationsteknologi Blue. Gene/L System Saturday, November 17, 2007 Computer Architecture I - Class 14 51
Informationsteknologi Cluster Computing n n n Consists of hundreds or thousands of computers connected by a commercially available network board Centralized cluster – a cluster of workstations mounted in a big rack in a single room Decentralized cluster – workstations spread around a building or campus, connected by a LAN Saturday, November 17, 2007 Computer Architecture I - Class 14 52
Informationsteknologi Google Built the world’s largest off-the-shelf cluster n It bought cheap, modest performance PCs n Lots of them! n A typical Google PC has a 2 GHz Pentium, 512 MB RAM, 80 GB disk n Saturday, November 17, 2007 Computer Architecture I - Class 14 53
Informationsteknologi A Typical Google Cluster Saturday, November 17, 2007 Computer Architecture I - Class 14 54
Informationsteknologi Communication Software Special software required for interprocess communication and synchronization n Most message passing systems provide two primitives – SEND and RECEIVE n Saturday, November 17, 2007 Computer Architecture I - Class 14 55
Informationsteknologi Three Main Semantics n Synchronous message passing ® n Buffered message passing ® n If sender has executed SEND and the receiver has not yet executed RECEIVE the sender is blocked until the receiver executes RECEIVE If a message is sent before receiver is ready the message is buffered somewhere until the receiver takes it out Nonblocking message passing ® ® Sender is allowed to continue immediately after executing SEND However, it may not reuse the message buffer as the message may not have been sent yet Saturday, November 17, 2007 Computer Architecture I - Class 14 56
- Akademiuddannelse i ungdomspædagogik
- Are you going to class today
- Todays class
- Welcome to today's class
- His landlord asked him to move
- Modular product architecture example
- Database storage architecture
- Ansi sparc
- Backbone network architectures
- Autoencoders, unsupervised learning, and deep architectures
- Scalable internet architectures
- Product architecture example
- Gui architectures
- Database system architectures
- Cdn architectures
- Aaron bannert
- Rolap architecture
- Isa instruction set
- Ecommerce site architecture
- Distributed systems architectures
- Backbone network architectures
- Gpu cache coherence
- Why systolic architectures
- Generations
- Multiple choice comma quiz
- Todays worldld
- Define radient energy
- Todays globl
- Whats todays wordlw
- Chapter 13 marketing in today's world worksheet answers
- Todays plan
- Todays sabbath lesson
- Welcome for sabbath school
- Todays match
- Todays health
- Today's objective
- Todays objective
- Todays objective
- Todays whether
- Todays vision
- Todays objective
- Planetarypositionstoday
- Mla first page
- Todays final jeopardy answer
- Handcuff nomenclature
- Todays weather hull
- Todays objective
- Wat is todays date
- Todays objective
- Swine digestive system
- Todays objective
- Safe online talk
- Objective on resume example
- Todays agenda
- Todays jeopardy
- Good morning class how are you today
- Adam smith jeopardy