CS 152 Computer Architecture and Engineering Lecture 26

  • Slides: 46
Download presentation
CS 152 Computer Architecture and Engineering Lecture 26 -- Midterm II Review Session 2014

CS 152 Computer Architecture and Engineering Lecture 26 -- Midterm II Review Session 2014 -4 -29 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love www-inst. eecs. berkeley. edu/~cs 152/ Play: CS 152 L 16: Midterm I Review UC Regents Spring 2014 © UCB

Today - Midterm II Review Session Study Tips HW 2, problem by problem (if

Today - Midterm II Review Session Study Tips HW 2, problem by problem (if there is time) HKN CS 152 L 16: Midterm I Review UC Regents Spring 2014 © UCB

CS 152 Midterm II May 1 st, 2014 # Points Name: 1 25 SSID:

CS 152 Midterm II May 1 st, 2014 # Points Name: 1 25 SSID: 2 25 3 25 4 25 “All the work is my own. I have no prior knowledge of the exam contents, aside from guidance from class staff. I will not share the contents with others in CS 152 who have not taken it yet. ” Signature: Please write clearly, and put your name on each page. Please abide by word limits. Good luck! Eric Love John Lazzaro Tot 100

What does it cover? Lectures 9 onward Focus will be on problems that require

What does it cover? Lectures 9 onward Focus will be on problems that require you to do a task (write a small program, trace through execution , etc) that demonstrates that you understand a concept. [. . . ] No transistor-level questions (DRAM and SRAM cells, etc) Time for a quick walk-through. . . CS 152 L 16: Midterm I Review UC Regents Spring 2014 © UCB

CS 152 Computer Architecture and Engineering Lecture 9 -- Memory 2014 -2 -18 John

CS 152 Computer Architecture and Engineering Lecture 9 -- Memory 2014 -2 -18 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love www-inst. eecs. berkeley. edu/~cs 152/ CS 152 L 16: Midterm I Review UC Regents Spring 2014 © UCB

Latency is not the same as bandwidth! Thus, push to faster DRAM interfaces 1

Latency is not the same as bandwidth! Thus, push to faster DRAM interfaces 1 13 -bit row address input of 81 92 de co de r What if we want all of the 16384 bits? In row access time (55 ns) we can do 22 transfers at 400 MT/s. 16 -bit chip bus -> 22 x 16 = 352 bits << Now the row access 16384 time looks fast! 16384 columns 8192 rows 134 217 728 usable bits (tester found good bits in bigger array) 16384 bits delivered by sense amps Select requested bits, send off the CS 152 L 9: Memory UC Regents Spring 2014 © UCB

CS 152 Computer Architecture and Engineering Lecture 10 -- Cache I 2014 -2 -20

CS 152 Computer Architecture and Engineering Lecture 10 -- Cache I 2014 -2 -20 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love www-inst. eecs. berkeley. edu/~cs 152/ CS 152 L 16: Midterm I Review UC Regents Spring 2014 © UCB

Latency: A closer look Read latency: Time to return first byte of a random

Latency: A closer look Read latency: Time to return first byte of a random access Reg L 1 Inst L 1 Data L 2 DRAM Disk Size 1 K 64 K 32 K 512 K 256 M 80 G Latency (cycles) 1 3 3 11 160 1 E+07 Latency (sec) 0. 6 n 1. 9 n 6. 9 n 100 n 12. 5 m 1. 6 G 533 M 145 M 10 M 80 Hz Architect’s latency toolkit: (1) Parallelism. Request data from N 1 -bit-wide memories at the same time. Overlaps latency cost for all N bits. Provides N times the bandwidth. Requests to N memory banks (interleaving) have potential of N times the bandwidth. (2) Pipeline memory. If memory has N cycles of latency, issue a request each cycle, receive it N cycles later. CS 194 -6 L 8: Cache UC Regents Fall 2008 © UCB

CS 152 Computer Architecture and Engineering Lecture 11 -- Cache II 2014 -2 -25

CS 152 Computer Architecture and Engineering Lecture 11 -- Cache II 2014 -2 -25 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love www-inst. eecs. berkeley. edu/~cs 152/ CS 152 L 16: Midterm I Review UC Regents Spring 2014 © UCB

Issue #4: When to write to lower level. . . Write-Through Write-Back Policy Data

Issue #4: When to write to lower level. . . Write-Through Write-Back Policy Data written to cache block also written to lower-level memory Write data only to the cache Update lower level when a block falls out of the cache Do read misses produce writes? No Yes Do repeated writes make it to lower level? Yes No CS 152 L 11: Cache II Related issue: Do writes to blocks not in the cache get put in the cache (”writeallocate”) or not? UC Regents Spring 2014 © UCB

CS 152 Computer Architecture and Engineering Lecture 12 -- Virtual Memory 2014 -2 -27

CS 152 Computer Architecture and Engineering Lecture 12 -- Virtual Memory 2014 -2 -27 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love www-inst. eecs. berkeley. edu/~cs 152/ CS 152 L 16: Midterm I Review UC Regents Spring 2014 © UCB

The TLB caches page table entries In this example, physical and virtual pages must

The TLB caches page table entries In this example, physical and virtual pages must be the same size! TLB caches page table entries. virtual address page for ASID off Physical frame address Page Table 2 0 1 3 TLB frame page 2 2 0 5 CS 152 L 15: Virtual Memory physical address frame page off MIPS handles TLB misses in software (random replacement). Other machines use hardware. V=0 pages either reside on disk or have not yet been allocated. OS handles V=0 “Page fault” UC Regents Fall 2006 © UCB

CS 152 Computer Architecture and Engineering Lecture 13 - Synchronization 2014 -3 -4 John

CS 152 Computer Architecture and Engineering Lecture 13 - Synchronization 2014 -3 -4 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love www-inst. eecs. berkeley. edu/~cs 152/ CS 152 L 16: Midterm I Review UC Regents Spring 2014 © UCB

Non-blocking consumer synchronization Another atomic read-modify-write instruction: Compare&Swap(Rt, Rs, m) if (Rt == M[m])

Non-blocking consumer synchronization Another atomic read-modify-write instruction: Compare&Swap(Rt, Rs, m) if (Rt == M[m]) then M[m] = Rs; Rs = Rt; /* do swap */ else /* do not swap */ Assuming sequential consistency: MEMBARs not shown try: . . . LW R 3, head(R 0) ; Load queue head into R 3 spin: LW R 4, tail(R 0) ; Load queue tail into R 4 BEQ R 4, R 3, spin ; If queue empty, wait LW R 5, 0(R 3) ; Read x from queue into R 5 ADDI R 6, R 3, 4 ; Shift head by one word Compare&Swap R 3, R 6, head(R 0); Try to update head BNE R 3, R 6, try ; If not success, try again If R 3 != R 6, another thread got here first, so we must try again. If thread swaps out before Compare&Swap, no latency problem; CS 152 L 24: Multiprocessors UC Regents Fall 2006 © UCB

CS 152 Computer Architecture and Engineering Lecture 14 - Cache Design and Coherence 2014

CS 152 Computer Architecture and Engineering Lecture 14 - Cache Design and Coherence 2014 -3 -6 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love www-inst. eecs. berkeley. edu/~cs 152/ CS 152 L 16: Midterm I Review UC Regents Spring 2014 © UCB

Writes from 10, 000 feet. . . for write-thru L 1 For write-thru caches.

Writes from 10, 000 feet. . . for write-thru L 1 For write-thru caches. . . CPU 1 CPU 0 Cache Snooper Memory bus Shared Main Memory Hierarchy To a first-order, reads will “just work” if write-thru caches implement this policy. A “two-state” protocol (cache lines are “valid” or “invalid”). CS 152 L 14: Cache Design and Coherency 1. Writing CPU takes control of bus. 2. Address to be written is invalidated in all other caches. Reads will no longer hit in cache and get stale data. 3. Write is sent to main memory. Reads will cache miss, retrieve new value from main UC Regents Spring 2014 © UCB

CS 152 Computer Architecture and Engineering Lecture 15 -- Advanced CPUs 2014 -3 -11

CS 152 Computer Architecture and Engineering Lecture 15 -- Advanced CPUs 2014 -3 -11 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love www-inst. eecs. berkeley. edu/~cs 152/ CS 152 L 15: Superscalars and Scoreboards UC Regents Spring 2014 © UCB

Split pipelines: a write-after-write hazard. Solution: SUB detects R 1 clash in decode stage

Split pipelines: a write-after-write hazard. Solution: SUB detects R 1 clash in decode stage and stalls, via a pipe-write scoreboard. WAW Hazard DIV R 1, R 2, R 3 SUB R 1, R 2, R 3 If long latency DIV and short latency SUB are sent to parallel pipes, SUB may finish first. CS 194 -6 L 9: Advanced Processors I The pipeline splits after the RF stage, feeding functional units with different latencies. UC Regents Fall 2008 © UCB

IF (Fetch) Superscalar R machine ID (Decode) IR IR Reg. File rs 2 ws

IF (Fetch) Superscalar R machine ID (Decode) IR IR Reg. File rs 2 ws 1 64 WB IR IR rd 1 Y R rd 2 Y R IR IR B wd 1 Data Instr Mem rs 3 Addr ws 2 rd 3 A rs 4 rd 4 B wd 2 32 PC and Sequencer MEM A rs 1 Instruction Issue Logic EX (ALU) WE 1 WE 2 IR IF (Fetch) CS 194 -6 L 9: Advanced Processors I IR ID (Decode) EX (ALU) MEM WB UC Regents Fall 2008 © UCB

CS 152 Computer Architecture and Engineering Lecture 17 -- Networks, Routers, Google 2014 -3

CS 152 Computer Architecture and Engineering Lecture 17 -- Networks, Routers, Google 2014 -3 -20 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love www-inst. eecs. berkeley. edu/~cs 152/ CS 152 L 16: Midterm I Review UC Regents Spring 2014 © UCB

6 key parameters scale across dimension of “by one server”, “by 80 -server rack”

6 key parameters scale across dimension of “by one server”, “by 80 -server rack” and “by array” To get more DRAM and disk capacity, you must work on a scale larger than a single server. But as you do, latency and bandwidth degrade, because network performance << a server bus, and because array network is under-provisioned. Exception: disk latency is roughly scale-independent.

CS 152 Computer Architecture and Engineering Lecture 18 -- Dynamic Scheduling I 2014 -4

CS 152 Computer Architecture and Engineering Lecture 18 -- Dynamic Scheduling I 2014 -4 -1 John Lazzaro (not a prof - “John” is always OK) Tha nks to Krst e Asa novi c. . . TA: Eric Love www-inst. eecs. berkeley. edu/~cs 152/ CS 152 L 16: Midterm I Review UC Regents Spring 2014 © UCB

Given an endless supply of registers. . . Rename “architected registers” (Ri, Fi) to

Given an endless supply of registers. . . Rename “architected registers” (Ri, Fi) to new “physical registers” (PRi, PFi) on each write. ADDI R 1, R 0, 64 ADDI PR 01, PR 00, 64 R 1→ PR 01 F 0→ PF 00 F 4, 0(R 1) LD PF 00 0(PR 01) ADDD PF 04, PF 00, PF 02 SD PF 04, 0(PR 01) SUBI PR 11, PR 01, 8 BEQZ PR 11 ENDLOOP ITER 2: LD PF 10 0(PR 11) What was gained? An instruction may execute once all of its source registers have been written. CS 152 L 18: Dynamic Scheduling I ADDD PF 14, PF 10, PF 02 SD PF 14, 0(PR 11) SUBI PR 21, PR 11, 8 BEQZ PR 21 ENDLOOP ITER 3: LD PF 20 O(PR 21) [. . . ] UC Regents Spring 2014 © UCB

CS 152 Computer Architecture and Engineering Lecture 19 -- Dynamic Scheduling II 2014 -4

CS 152 Computer Architecture and Engineering Lecture 19 -- Dynamic Scheduling II 2014 -4 -3 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love www-inst. eecs. berkeley. edu/~cs 152/ CS 152 L 16: Midterm I Review UC Regents Spring 2014 © UCB

Rename stage close-up: (1) Allocates new physical registers for destinations, (2) Looks up physical

Rename stage close-up: (1) Allocates new physical registers for destinations, (2) Looks up physical register numbers for sources, (3) Handle rename dependences within the 4 issuing instructions in one clock cycle! For mis-speculation recovery Timestamped. Input: 4 instructions specifying architected registers. Output: 12 physical registers numbers: 1 destination and 2 sources for the 4 instructions to be issued.

CS 152 Computer Architecture and Engineering Lecture 20 -- Dynamic Scheduling III 2014 -4

CS 152 Computer Architecture and Engineering Lecture 20 -- Dynamic Scheduling III 2014 -4 -8 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love www-inst. eecs. berkeley. edu/~cs 152/ CS 152 L 16: Midterm I Review UC Regents Spring 2014 © UCB

Micro-op translation example. . . ADC m 32, r 32: // for a simple

Micro-op translation example. . . ADC m 32, r 32: // for a simple m 32 address mode Becomes: LD T 1 0(EBX); // EBX register point to m 32 ADD T 1, CF; // CF is carry flag from EFLAGS ADD T 1, r 32; // Add the specified register ST 0(EBX) T 1; // Store result back to m 32 Instruction traces of IA-32 programs show most executed instructions require 4 or fewer micro-ops. Translation for these ops are cast into logic gates, often over several pipeline cycles. CS 152 L 20: Dynamic Scheduling III UC Regents Fall 2006 © UCB

CS 152 Computer Architecture and Engineering Lecture 21 -- Dataflow 2014 -4 -10 John

CS 152 Computer Architecture and Engineering Lecture 21 -- Dataflow 2014 -4 -10 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love www-inst. eecs. berkeley. edu/~cs 152/ CS 152 L 16: Midterm I Review UC Regents Spring 2014 © UCB

Dataflow stages of 21264 Idea: Write dataflow programs that reference physical registers, to execute

Dataflow stages of 21264 Idea: Write dataflow programs that reference physical registers, to execute on this machine. Input: Instructions that reference physical registers. Scoreboard: Tracks writes to physical registers.

CS 152 Computer Architecture and Engineering Lecture 22 -- GPU + SIMD + Vectors

CS 152 Computer Architecture and Engineering Lecture 22 -- GPU + SIMD + Vectors I 2014 -4 -15 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love www-inst. eecs. berkeley. edu/~cs 152/ CS 152 L 16: Midterm I Review UC Regents Spring 2014 © UCB

Pure data move opcode. Or, part of a math opcode.

Pure data move opcode. Or, part of a math opcode.

CS 152 Computer Architecture and Engineering Lecture 23 -- GPU + SIMD + Vectors

CS 152 Computer Architecture and Engineering Lecture 23 -- GPU + SIMD + Vectors II 2014 -4 -17 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love www-inst. eecs. berkeley. edu/~cs 152/ CS 152 L 16: Midterm I Review UC Regents Spring 2014 © UCB

Assume Mac. Book Air. . . 1386 x 768 screen. . . We are

Assume Mac. Book Air. . . 1386 x 768 screen. . . We are all zoomed in on Google Maps Lets us cache a 1024 x 1024 window of the 11 PB Earth map in 34. 7 MB! Top pyramid image is 4 K x 4 K. . . Idea: Keep only a 1386 x 768 window of top images in RAM. . .

Zoom all the way in. . . units of pixels Bottom stack image shows

Zoom all the way in. . . units of pixels Bottom stack image shows the smallest part of the 1 mile sq. patch of the Earth of any stack image. units of sq. miles Graphics hardware displays bottom stack image, which fills Mac. Book Air display. units of miles Hardware interpolation of stack levels.

CS 152 Computer Architecture and Engineering Lecture 24 -- Voxel Processing 2014 -4 -22

CS 152 Computer Architecture and Engineering Lecture 24 -- Voxel Processing 2014 -4 -22 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love www-inst. eecs. berkeley. edu/~cs 152/ CS 152 L 16: Midterm I Review UC Regents Spring 2014 © UCB

After processing. . . A 3 -D matrix of cubes, in object space (X,

After processing. . . A 3 -D matrix of cubes, in object space (X, Y, Z). 8 -bit density value stored for each cube (0 = “air”). 256^3 = 16 MB = 10 inch cube (for 1 mm voxels) 0. 125 mm voxels? 8 GB Interesting to computer architects because n^3 grows so quickly!

CS 152 Computer Architecture and Engineering Lecture 25 -- Digital Imaging 2014 -4 -24

CS 152 Computer Architecture and Engineering Lecture 25 -- Digital Imaging 2014 -4 -24 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love www-inst. eecs. berkeley. edu/~cs 152/ CS 152 L 16: Midterm I Review UC Regents Spring 2014 © UCB

Camera interface to the outside world Simple Power Hookup Serial port to control the

Camera interface to the outside world Simple Power Hookup Serial port to control the camera. 8 -bit Dout Port 54 MHz Clk 1280 x 1024 @ 15 fps 640 x 512 @ 30 fps YCr. Cb 4: 2: 2 CS 250 L 12: CMOS Imagers UC Regents Fall 2012 © UCB

AWARE-2: Array of 98 phone camera modules (14 M-pixel) 1. 3 G-pixel camera @3

AWARE-2: Array of 98 phone camera modules (14 M-pixel) 1. 3 G-pixel camera @3 frames/sec

On Thursday Mid-term II. . . Ground rules. . .

On Thursday Mid-term II. . . Ground rules. . .

Mid-term: How to do well. . . Problem intro often features a lecture slide.

Mid-term: How to do well. . . Problem intro often features a lecture slide. If you have to teach yourself that slide during the test, you’re starting out behind. Getting the problem correct requires thinking on your feet to do a new design or analyze one given to you. There will not be “you can only get it if do the reading” problems. . . but the reading helps you understand how to think through the problem. CS 152 L 16: Midterm I Review UC Regents Spring 2014 © UCB

Mid-term: There may be math. . . No memorization: If we ask about Amdahl’s

Mid-term: There may be math. . . No memorization: If we ask about Amdahl’s Law, we will show its definition lecture slide. Understanding is needed: A problem may require you to apply equation to a design, etc. Cannot use You may need to do: electronic simple algebra and calculus, devices. . . more add a few numbers by hand, administrative etc. info after we do some content. CS 152 L 16: Midterm I Review UC Regents Spring 2014 © UCB

When is it? Where is it? Ground rules. 9: 30 AM sharp, Tuesday May

When is it? Where is it? Ground rules. 9: 30 AM sharp, Tuesday May 1 st, 306 Soda. Every-other-seating, except for the front rows, where every-seat is permitted. No blue-books needed. We will be handing out a paper test. Pencil is preferred. Pencils down @ 10: 55 AM, so we can collect papers before next class comes in. CS 152 L 16: Midterm I Review UC Regents Spring 2014 © UCB

When is it? Where is it? Ground rules. No use of calculators, smartphones, laptops,

When is it? Where is it? Ground rules. No use of calculators, smartphones, laptops, etc. . . during the exam. Closed-book, closed-notes. Just pencils, erasers. No consulting with students. Restroom breaks are OK, but you’ll still need to hand in your exam @ 10: 55. Questions are reserved for serious concerns about a bug in the question. CS 152 L 16: Midterm I Review UC Regents Spring 2014 © UCB

Today - Midterm II Review Session Study Tips HW 2, problem by problem (if

Today - Midterm II Review Session Study Tips HW 2, problem by problem (if there is time) HKN CS 152 L 16: Midterm I Review UC Regents Spring 2014 © UCB

On Thursday Mid-term II. . . See you there !

On Thursday Mid-term II. . . See you there !