Hasim Joel Emer Michael Adler Artur Klauser Angshuman

  • Slides: 31
Download presentation
Hasim Joel Emer†‡ Michael Adler†, Artur Klauser†, Angshuman Parashar†, Michael Pellauer‡, Murali Vijayaraghavan‡ †VSSAD

Hasim Joel Emer†‡ Michael Adler†, Artur Klauser†, Angshuman Parashar†, Michael Pellauer‡, Murali Vijayaraghavan‡ †VSSAD Intel ‡CSAIL MIT

Overview • Goal – Produce compelling evidence for architecture ideas • Requirements – Cycle

Overview • Goal – Produce compelling evidence for architecture ideas • Requirements – Cycle accurate simulation – Representative simulation length – Software development (often) • Current approach – Mostly software simulation (10 KHz to 1 KHz) • New approach – Build a performance model in an FPGA 2 2007. 05. 14 Hasim

FPGA-based approaches • Prototyping – Build a logically isomorphic representation of the design •

FPGA-based approaches • Prototyping – Build a logically isomorphic representation of the design • Modeling – Build a performance simulation in gates • Hybrids – Build something that is partially a prototype and partially a model 3 2007. 05. 14 Hasim

Recreate Asim in hardware • Modularity • Inter-module communication • Functional/Timing Partitioning • Modeling

Recreate Asim in hardware • Modularity • Inter-module communication • Functional/Timing Partitioning • Modeling Utilities 4 2007. 05. 14 Hasim

Why modularity? • Speed of model development • Shared components between products • Reuse

Why modularity? • Speed of model development • Shared components between products • Reuse across generations • Encourages isomorphism to design • Improved fidelity • Facilitates speed/fidelity trade-offs • Architectural experimentation • Factorial development and evaluations • Sharing 5 2007. 05. 14 Hasim

ASIM Module Hierarchy S F B 6 2007. 05. 14 Hasim D C M

ASIM Module Hierarchy S F B 6 2007. 05. 14 Hasim D C M N R X C W

ASIM Module Selection S F B 7 2007. 05. 14 Hasim D B C

ASIM Module Selection S F B 7 2007. 05. 14 Hasim D B C M N R X C B W B B

Module Selection S F B 8 2007. 05. 14 Hasim D B C M

Module Selection S F B 8 2007. 05. 14 Hasim D B C M N R X C B W B B

Module Replacement S F B 9 2007. 05. 14 Hasim D B C M

Module Replacement S F B 9 2007. 05. 14 Hasim D B C M N R X C B W B B

(H)ASIM Module Hierarchy 10 2007. 05. 14 Hasim

(H)ASIM Module Hierarchy 10 2007. 05. 14 Hasim

Communication C F N 11 2007. 05. 14 Hasim D R X C N

Communication C F N 11 2007. 05. 14 Hasim D R X C N W

Named connections A-out S 12 2007. 05. 14 Hasim A-in D

Named connections A-out S 12 2007. 05. 14 Hasim A-in D

Model and FPGA Cycles Port Module B Module A Port A 1. 1 B

Model and FPGA Cycles Port Module B Module A Port A 1. 1 B 1. 1 1 13 2007. 05. 14 1. 2 2 Hasim 1. 3 3 4 2. 1 2. 2 2. 3 5 6 7 Port 8 A 1. 1 1. 2 1. 3 2. 1 B 1. 1 2. 2 2. 3 1 2 3 4 2. 2 5 6 7 8

Functional/Timing Decomposition • ISA semantics • Platform semantics • Micro-architecture Fetch(PC) … Timing Partition

Functional/Timing Decomposition • ISA semantics • Platform semantics • Micro-architecture Fetch(PC) … Timing Partition Functional Partition Instruction • Simplifies timing model • Amortize functional model design effort over many models • Can be pipelined for performance • Can be FPGA-friendly design • Can be split across hardware and software 14 2007. 05. 14 Hasim

Execute@execute phases Fetch instruction Speculatively execute instruction Read memory* Speculatively write memory* (locally visible)

Execute@execute phases Fetch instruction Speculatively execute instruction Read memory* Speculatively write memory* (locally visible) Commit or Abort instruction Write memory* (globally visible) * 15 Optional depending on instruction type 2007. 05. 14 Hasim

Execution in phases F D X F D F X D X F D

Execution in phases F D X F D F X D X F D F C R C W X D R X C W A X C Assertion: All data dependencies can be represented in these phases 16 2007. 05. 14 Hasim W

HASim: Partitioning Overview Timing Partition Token Gen Functional Partition 17 2007. 05. 14 Fet

HASim: Partitioning Overview Timing Partition Token Gen Functional Partition 17 2007. 05. 14 Fet Memory State Hasim Dec Exe Mem Register State Reg. File LCom GCom

Common Infrastructure • Modules • Inter-module communication • Statistics gathering • Event logging •

Common Infrastructure • Modules • Inter-module communication • Statistics gathering • Event logging • Debug Tracing • Simulation control • … 18 2007. 05. 14 Hasim

Bluespec (Asim-style) module [HAsim_module] mk. Cache#() (Empty); Port#(Addr) req_port <- mk. Send. Port(‘a 2

Bluespec (Asim-style) module [HAsim_module] mk. Cache#() (Empty); Port#(Addr) req_port <- mk. Send. Port(‘a 2 cache’); Port#(Bool) resp_port <- mk. Recv. Port(‘cache 2 a’); Tag. Array tagarray <- mk. Tag. Array(); rule cycle(True); Maybe#(Addr) mx = req_port. get(); if (is. Valid(mx)) resp_port. put(tagarray. lookup(valid. Value(mx))); endrule endmodule 19 2007. 05. 14 Hasim

Bluespec (Asim-style) submodule mk. Tag. Array(Tag. Array); Reg. File#(Bit#(12), Bit#(4)) tag. Array<- mk. Reg.

Bluespec (Asim-style) submodule mk. Tag. Array(Tag. Array); Reg. File#(Bit#(12), Bit#(4)) tag. Array<- mk. Reg. File. Full(. . . ); method Bool lookup(Bit#(16) a); return (tag. Array. sub(get. Index(a)) == get. Tag(a)); endmethod function Bit#(4) get. Tag(Address x); return x[15: 12]; endfunction function Bit#(12) get. Index(Address x); return x[11: 0]; endfunction endmodule 20 2007. 05. 14 Hasim

Support functions - stats Module Stat Counter module mk. Cache#(. . . ) (Empty);

Support functions - stats Module Stat Counter module mk. Cache#(. . . ) (Empty); . . . cache_hits <- mk. Stat(. . . ); . . . hit=tagarray. lookup(. . . ); if (hit) cache_hits. increment(); endif . . . endmodule Module Stat Counter 21 2007. 05. 14 Hasim Stat Dumper

2 Dreams 22 2007. 05. 14 Hasim

2 Dreams 22 2007. 05. 14 Hasim

Support functions - events Module Event Reg 23 2007. 05. 14 Hasim module mk.

Support functions - events Module Event Reg 23 2007. 05. 14 Hasim module mk. Cache#(. . . ) (Empty); . . . cache_event <- mk. Event(. . . ); . . . hit=tagarray. lookup(. . . ); cache_event. report(hit); . . . endmodule Event Dumper

Support functions – global controller Module Controller 24 2007. 05. 14 Hasim module mk.

Support functions – global controller Module Controller 24 2007. 05. 14 Hasim module mk. Cache#(. . . ) (Empty); . . . ctrl <- mk. Cntrlr(. . . ); . . . rule (ctrl. run()). . . endrule endmodule Global Controller

25 2007. 05. 14 Hasim

25 2007. 05. 14 Hasim

FPGA-based prototype Prototyping Catch-22… 26 2007. 05. 14 Hasim

FPGA-based prototype Prototyping Catch-22… 26 2007. 05. 14 Hasim

Module Instantiation U M FFF 27 2007. 05. 14 Hasim D D D CCC

Module Instantiation U M FFF 27 2007. 05. 14 Hasim D D D CCC M N RRR X X X CCC W W W

Factorial Coding/Experiments S C M SM N C M RM Hasim C S SC

Factorial Coding/Experiments S C M SM N C M RM Hasim C S SC 2007. 05. 14 S SM SC 28 N RC S C M M RC N RM N

HAsim: Current status - models • Simple RISC functional model operating – – –

HAsim: Current status - models • Simple RISC functional model operating – – – Simple RISC ISA Pipelined multi-phase instruction execution Supports speculative OOO design • Physical Reg File and ROB • Small physically addressed memory • Fast speculative rewinds • Instruction-per-cycle (APE) model – Runs simple benchmarks on FPGA • Five stage pipeline – Supports branch mis-speculation – Runs simple benchmarks (in software simulation) • 29 X 86 functional model architecture under development 2007. 05. 14 Hasim

Connections Implement Ports foo baz foo bar bar baz PM (Module Tree w. Connections)

Connections Implement Ports foo baz foo bar bar baz PM (Module Tree w. Connections) foo PM (Hardware Modules w. Wrappers) Implemented via connections. 30 2007. 05. 14 Hasim foo

Timing Model Resources (Fast) OOO, branch prediction, three functional units, 32 KB 2 -way

Timing Model Resources (Fast) OOO, branch prediction, three functional units, 32 KB 2 -way set associative ICache and DCache, i. TLB, d. TLB 2142 slices (15% of a 2 VP 30) • 21 block RAMs (15% of a 2 VP 30) Configurable cache model • 32 KB 4 -way set associative cache with 16 B cache-lines – 165 slices (1% of a 2 VP 30) – 17 block RAMs (12% of a 2 VP 30) • 2 MB 4 -way set-associative cache with 64 B cache-lines – 140 slices (1% of a 2 VP 30) – 40 block RAMs (29% of a 2 VP 30) Current FPGAs (4 VFX 140) • 142, 128 slices • 552 block RAMs • 2 Power. PCs 31 2007. 05. 14 Hasim