6 175 Constructive Computer Architecture Tutorial 5 Epochs

  • Slides: 20
Download presentation
6. 175: Constructive Computer Architecture Tutorial 5 Epochs, Debugging, and Caches Quan Nguyen (Troubled

6. 175: Constructive Computer Architecture Tutorial 5 Epochs, Debugging, and Caches Quan Nguyen (Troubled by the two biggest problems in computer science… and Comic Sans) October 28, 2016 http: //csg. csail. mit. edu/6. 175 T 05 -1

Agenda • Epochs: a review • Debugging your processor ft. Piazza • Caches: a

Agenda • Epochs: a review • Debugging your processor ft. Piazza • Caches: a primer October 28, 2016 http: //csg. csail. mit. edu/6. 175 T 05 -2

Review: 1 -bit Distributed Epochs Delay: 0 cycle fe. Ep = 0 fd. Ep

Review: 1 -bit Distributed Epochs Delay: 0 cycle fe. Ep = 0 fd. Ep = 0 PC • • • Delay: 100 cycles d. Ep = 0 Inst 1 redirect, ie. Ep = 0 de. Ep = 1 Fetch f 2 d Decode e. Ep = 0 d 2 e Execute Inst 2. . . Decode redirects Inst 1 (ie. Ep = id. Ep = 0) Execute redirects Inst 1 Correct-path Inst 2 (ie. Ep = 1, id. Ep = 0) issues Execute redirects Inst 2 Inst 1 redirect arrives at Fetch (ie. Ep == fe. Ep) n October 28, 2016 change PC to a wrong value http: //csg. csail. mit. edu/6. 175 T 05 -3

Review: Unbounded Global Epochs e. Epoch d. Epoch redirect PC Redirect redirect PC miss

Review: Unbounded Global Epochs e. Epoch d. Epoch redirect PC Redirect redirect PC miss pred? PC Fetch f 2 d Decode d 2 e Execute . . . • Both Decode and Execute can redirect the PC n Execute redirect should never be overruled • Global epoch for each redirecting stage n n n October 28, 2016 e. Epoch: incremented when redirect from Execute takes effect d. Epoch: incremented when redirect from Decode takes effect Initially set all epochs to 0 http: //csg. csail. mit. edu/6. 175 T 05 -4

Review: Branch History Table (BHT) from Fetch Instruction Opcode Fetch PC offset 0 0

Review: Branch History Table (BHT) from Fetch Instruction Opcode Fetch PC offset 0 0 k + Branch? BHT Index Target PC 2 k-entry BHT, 2 bits/entry At the Decode stage, if the instruction is a branch then BHT is consulted using the pc; if BHT shows a different prediction than the incoming ppc, Fetch is redirected Taken/¬Taken? 4 K-entry BHT, 2 bits/entry, ~80 -90% correct direction predictions October 28, 2016 http: //csg. csail. mit. edu/6. 175 T 05 -5

Review: Two-Level Branch Predictor Pentium Pro uses the result from the last two branches

Review: Two-Level Branch Predictor Pentium Pro uses the result from the last two branches to select one of the four sets of BHT bits (~95% correct) 00 Fetch PC k Four 2 k, 2 -bit Entry BHT 2 -bit global branch history shift register Shift in Taken/¬Taken results of each branch Taken/¬Taken? October 28, 2016 http: //csg. csail. mit. edu/6. 175 T 05 -6

Review: Tournament Predictor “The Alpha 21264 Microprocessor Architecture” • 10 -bit PC: index 1024

Review: Tournament Predictor “The Alpha 21264 Microprocessor Architecture” • 10 -bit PC: index 1024 x 10 -bit local history table • 10 -bit local history n Index 1024 x 3 -bit BHT: prediction 1 • 12 -bit global history n n October 28, 2016 Index 4096 x 2 -bit BHT: prediction 2 Index 4096 x 2 -bit BHT: select between predictions 1, 2 http: //csg. csail. mit. edu/6. 175 T 05 -7

Debugging Your Processor October 28, 2016 http: //csg. csail. mit. edu/6. 175 T 05

Debugging Your Processor October 28, 2016 http: //csg. csail. mit. edu/6. 175 T 05 -8

Unsupported Instruction • Processor initialized? (csrf. started) • What could be redirecting your PC?

Unsupported Instruction • Processor initialized? (csrf. started) • What could be redirecting your PC? • • • October 28, 2016 Faulty branch address calculation? BTB? (Lab 6 hint!) Bad instruction? (unlikely for this course) http: //csg. csail. mit. edu/6. 175 T 05 -9

Processor Hangs • Rules conflict? • • Check schedule (option “-show-schedule”) Use $display() statements

Processor Hangs • Rules conflict? • • Check schedule (option “-show-schedule”) Use $display() statements to diagnose • Did you size pipeline FIFOs correctly? • Are FIFOs being drained? October 28, 2016 http: //csg. csail. mit. edu/6. 175 T 05 -10

Incorrect Behavior • Which test fails? • Where is it in the dump? •

Incorrect Behavior • Which test fails? • Where is it in the dump? • Do you have a log? • October 28, 2016 If your rules don’t fire, temporarily make simpler rules http: //csg. csail. mit. edu/6. 175 T 05 -11

Demo: our code src/Two. Stage. bsv rule do. Fetch (csrf. started); // fetch Data

Demo: our code src/Two. Stage. bsv rule do. Fetch (csrf. started); // fetch Data inst = i. Mem. req(pc. Reg[0]); // Addr pred. Pc = btb. pred. Pc(pc. Reg[0]); Addr pred. Pc = pc. Reg[0]; // always predict PC to be next PC. . . endrule [qmn@vlsifarm] $. /run_asm. sh twostage. . . -- assembly test: lw -ERROR: Executing unsupported instruction at pc: 00001000. Exiting ^C October 28, 2016 http: //csg. csail. mit. edu/6. 175 T 05 -12

Demo: the log scemi/sim/logs/lw. log . . . Cycle 5 --------------------------Fetch: PC = 00000208,

Demo: the log scemi/sim/logs/lw. log . . . Cycle 5 --------------------------Fetch: PC = 00000208, inst = 0000 a 183, expanded = lw r 3 = [r 1 0 x 0] Execute finds misprediction: PC = 00000208 Fetch: Mispredict, redirected by Execute Cycle 6 --------------------------Fetch: PC = 00001000, inst = 00 ff, expanded = unsupport 0 x 00 ff Execute: Kill instruction October 28, 2016 http: //csg. csail. mit. edu/6. 175 T 05 -13

Demo: the dump programs/assembly/build/assembly/dump/lw. dump Disassembly of section. text: 00000200 <_start>: 200: 000010 b

Demo: the dump programs/assembly/build/assembly/dump/lw. dump Disassembly of section. text: 00000200 <_start>: 200: 000010 b 7 204: 00008093 208: 0000 a 183. . . lui mv lw x 1, 0 x 1 x 1, x 1 x 31, 0(x 1) # 1000 Disassembly of section. data: 00001000 <begin_signature>: 1000: 00 ff 0 xff 1002: 00 ff 0 xff October 28, 2016 http: //csg. csail. mit. edu/6. 175 T 05 -14

Demo: back to the codesrc/Two. Stage. bsv rule do. Execute (csrf. started); if (e.

Demo: back to the codesrc/Two. Stage. bsv rule do. Execute (csrf. started); if (e. Inst. mispredict) begin $display("Execute finds misprediction: PC = %x”, f 2 e. pc); exe. Redirect[0] <= Valid(Exe. Redirect{ pc: f 2 e. pc, next. Pc: e. Inst. addr }); endrule • What sets e. Inst. addr? • What happens to exe. Redirect[0]? October 28, 2016 http: //csg. csail. mit. edu/6. 175 T 05 -15

Caches October 28, 2016 http: //csg. csail. mit. edu/6. 175 T 05 -16

Caches October 28, 2016 http: //csg. csail. mit. edu/6. 175 T 05 -16

f. Epoch Multistage Pipeline nap PC redirect Register File e. Epoch e 2 c

f. Epoch Multistage Pipeline nap PC redirect Register File e. Epoch e 2 c Execute Decode d 2 e Inst Memory scoreboard Data Memory The use of magic memories (combinational reads) makes these designs unrealistic October 28, 2016 http: //csg. csail. mit. edu/6. 175 T 05 -17

Magic Memory Model Write. Enable Clock Address Write. Data MAGIC RAM Read. Data •

Magic Memory Model Write. Enable Clock Address Write. Data MAGIC RAM Read. Data • Reads and writes are always completed in one cycle n n n a Read can be done any time (i. e. combinational) If enabled, a Write is performed at the rising clock edge (the write address and data must be stable at the clock edge) In a real DRAM the data will be available several cycles after the address is supplied October 28, 2016 http: //csg. csail. mit. edu/6. 175 T 05 -18

Memory Hierarchy CPU Reg. File Small, Fast Memory SRAM Big, Slow Memory DRAM holds

Memory Hierarchy CPU Reg. File Small, Fast Memory SRAM Big, Slow Memory DRAM holds frequently used data size: latency: bandwidth: Reg. File << SRAM << DRAM on-chip >> off-chip why? On a data access: hit (data Î fast memory) low latency access miss (data Ï fast memory) long latency access (DRAM) October 28, 2016 http: //csg. csail. mit. edu/6. 175 T 05 -19

Two biggest problems in CS • Cache invalidation § How to inform caches of

Two biggest problems in CS • Cache invalidation § How to inform caches of stale data • Naming things • Off-by-one errors October 28, 2016 http: //csg. csail. mit. edu/6. 175 T 05 -20