16 482 16 561 Computer Architecture and Design

Lecture outline n Announcements/reminders q q HW 9 due today Final exam: Thursday, 4/23

Final exam notes n Allowed to bring: q q n n No other notes

Review: TLP, multithreading n n n ILP (which is implicit) limited on realistic hardware

Review: Memory hierarchies n n n Want large, fast, low-cost memory; can’t have it

Review: cache optimizations n Way prediction q Want benefits of n n q n

Review: cache optimizations n Critical word first / early restart q q n Fetch

Review: Storage n Disks RAID q q q Build redundancy (parity) into disk array

Review: Classifying Parallel Architecture = Computer Architecture + Communication Multiprocessors Architecture n n Classifying

Review: Cache coherence protocols n Need to maintain coherence; two types of protocols for

Review: More cache coherence n Should be familiar with coherence protocol state transition diagrams

Final notes n Next time: Final Exam (Thursday, 4/23) q Must complete course evaluation

Slides: 12

Download presentation

16. 482 / 16. 561 Computer Architecture and Design Instructor: Dr. Michael Geiger Spring 2015 Lecture 11: Final Exam Preview

Lecture outline n Announcements/reminders q q HW 9 due today Final exam: Thursday, 4/23 n Must complete course evaluation prior to starting exam q q n Will post link as soon as it’s available Print and bring to exam Today’s lecture: Final Exam Preview 12/20/2021 Computer Architecture Lecture 11 2

Final exam notes n Allowed to bring: q q n n No other notes or electronic devices Test policies are same as last time q n Will be written for ~90 minutes. . . in theory Covers all lectures after Exam 1 q n Can’t remove anything from bag during exam, no sharing (pencils, erasers, etc. ), only one person allowed out of room at a time Exam will last until 9: 30 q n Two 8. 5” x 11” double-sided sheets of notes Calculator Material starts with multiple issue/multithreading Question formats q Problem solving n q Largely similar to homework problems, but shorter Some short answer n 12/20/2021 Explain concepts in short paragraph Computer Architecture Lecture 11 3

Review: TLP, multithreading n n n ILP (which is implicit) limited on realistic hardware for single thread of execution Could look at Thread-Level Parallelism (TLP) Focus on multithreading q q q Coarse-grained: switch threads on a long latency stall Fine-grained: switch threads every cycle Simultaneous: allow multiple threads to execute in same cycle 12/20/2021 Computer Architecture Lecture 11 4

Review: Memory hierarchies n n n Want large, fast, low-cost memory; can’t have it all at once Use multiple levels: cache (may have >1), main memory, disk Discussed operation of hierarchy, focusing heavily on: q q Caching: principle of locality, addressing cache, “ 4 questions” (block placement, identification, replacement, write strategy) Virtual memory: page tables & address translation, page replacement, TLB 12/20/2021 Computer Architecture Lecture 11 5

Review: cache optimizations n Way prediction q Want benefits of n n q n Predict which way within set holds data Trace cache q q n Direct-mapped: fast hits Set-associative: fewer conflicts Intel-specific (originally—ARM uses as well) Track dynamic “traces” of decoded micro-ops/instructions Multi-banked caches q q Cache physically split into pieces (“banks”) Data sequentially interleaved n q Spread accesses across banks Allows for cache pipelining, non-blocking caches 12/20/2021 Computer Architecture Lecture 11 6

Review: cache optimizations n Critical word first / early restart q q n Fetch desired word in cache block first on miss Restart processor as soon as desired word received Prefetching q q q Anticipate misses and fetch blocks into cache Hardware: next-sequential prefetching (fetch block after one causing miss) most common/effective Software: prefetch instructions 12/20/2021 Computer Architecture Lecture 11 7

Review: Storage n Disks RAID q q q Build redundancy (parity) into disk array because MTTFarray = MTTFdisk / # disks Files “striped” across disks along with parity Different levels of RAID offer improvements n n 12/20/2021 RAID 1: Mirroring RAID 3: Introduced parity disk RAID 4: Use per-sector error correction allows small reads RAID 5: Interleaved parity allows small writes Computer Architecture Lecture 11 8

Review: Classifying Parallel Architecture = Computer Architecture + Communication Multiprocessors Architecture n n Classifying by memory: q Centralized Memory (Symmetric) Multiprocessor n q Physically Distributed-Memory multiprocessor n n Typically for smaller systems bandwidth demands on both network and memory system Scales better bandwidth demands (mostly) for network Classifying by communication: q Message-Passing Multiprocessor: processors explicitly pass messages q Shared Memory Multiprocessor: processors communicate through shared address space n n 12/20/2021 If using centralized memory, UMA (Uniform Memory Access time) If distributed memory, NUMA (Non Uniform Memory Access time) Computer Architecture Lecture 11 9

Review: Cache coherence protocols n Need to maintain coherence; two types of protocols for doing so q Directory based: One (logically) centralized structure holds sharing information n q Snooping: Each cache holding a copy of data has copy of its sharing information n n q n One entry per memory block One entry per cache block Caches “snoop” bus and respond to relevant transactions You should have an understanding of both types of protocols Two ways to handle writes q Write invalidate: Ensure that cache has exclusive access to block before performing write n q All other copies are invalidated Write update (or write broadcast): Update all cached copies of a block whenever block is written 12/20/2021 Computer Architecture Lecture 11 10

Review: More cache coherence n Should be familiar with coherence protocol state transition diagrams q What happens on different CPU transactions? (Read/Write, Hit/Miss) q For snooping protocol, what happens for relevant transactions sent over bus? (Read or write miss) n q For directory protocol, what happens for each message sent to given processor or directory? (Read/Write Miss, Invalidate and/or Fetch, Data Reply) n n In snooping protocol, all state transitions occur for cache block, since cache block tracks sharing info State transitions occur in cache blocks and directory entries (directory holds sharing info) Additional cache misses: coherence misses q True sharing misses: one processor writes to same location that’s later accessed by another processor q False sharing misses: one processor writes to same block (but different location) later accessed by another processor 12/20/2021 Computer Architecture Lecture 11 11

Final notes n Next time: Final Exam (Thursday, 4/23) q Must complete course evaluation prior to starting exam n n n Will post link as soon as it’s available Print and bring to exam Reminders q HW 9 due today 12/20/2021 Computer Architecture Lecture 11 12