TOPAZ An OpenSource Interconnection Network Simulator for Chip

  • Slides: 26
Download presentation
TOPAZ: An Open-Source Interconnection Network Simulator for Chip Multiprocessors and Supercomputers Pablo Abad, Pablo

TOPAZ: An Open-Source Interconnection Network Simulator for Chip Multiprocessors and Supercomputers Pablo Abad, Pablo Prieto, Lucia Menezo, Adrian Colaso, Valentin Puente, Jose-Angel Gregorio University of Cantabria TOPAZ@NOCS 12 1

Interconnects Research: Simulation Tool What makes a Simulation Tool better than others? Flexibility -Heterogeneous

Interconnects Research: Simulation Tool What makes a Simulation Tool better than others? Flexibility -Heterogeneous field, from supercomputers to CMP. - Highly Configurable TOPAZ • Interfaz to Full-System Simulation. • Multithreaded simulation for massive number of routers. Accuracy Vs. Comp. Effort - Avoid slow simulations for first stages of research process. - But provide accurate enough results at last stages. Ease-of-Use -Fast learning is essential. - MAX: 1 -day delay for user-mode TOPAZ@NOCS 12 • Simple & Complex models. • Dynamic accuracy simulation. • Many out-of-the-box components. • Very modular code, easy to understand. 2

Outline • Simulator Description • Out-of-the-Box • Utilization Examples • Support & Collaboration TOPAZ@NOCS

Outline • Simulator Description • Out-of-the-Box • Utilization Examples • Support & Collaboration TOPAZ@NOCS 12 3

Main Features • Evolution of SICOSYS • Object-oriented Design - Implemented in C++ -

Main Features • Evolution of SICOSYS • Object-oriented Design - Implemented in C++ - 100 classes / 50, 000 lines of code - High portability (C++ standard compiler) • Different levels of detail • Support for parallel execution [REF] V. Puente, J. A. Gregorio, R. Beivide, SICOSYS: an integrated framework for studying interconnection network performance in multiprocessor systems. IEEE Comput. Soc, 2000. TOPAZ@NOCS 12 4

Main Features • Evolution of SICOSYS • Object-oriented Design T 1 SIMPLE ROUTER DETAILED

Main Features • Evolution of SICOSYS • Object-oriented Design T 1 SIMPLE ROUTER DETAILED ROUTER 1 -C++ class description -C++ class per component (+) Fast Simulation - (--) Slower Simulation (--) Accuracy - (++) Higher Accuracy T 2 T 3 N Consumer Injector • Different levels of detail W Buffer Crossbar Rtg. & Arb. • Support for parallel execution TOPAZ@NOCS 12 S 5

Using TOPAZ (Building) >. /TPZSimul –s SIMUL_DETAILED TPZSimul. ini <Router. File id=". . /sgm/Router.

Using TOPAZ (Building) >. /TPZSimul –s SIMUL_DETAILED TPZSimul. ini <Router. File id=". . /sgm/Router. sgm" > <Network. File id=". . /sgm/Network. sgm" > <Simulation. File id=". . /sgm/Simula. sgm" > Simula. sgm <Simulation id="SIMUL_DETAILED"> <Network id="TORUS"> <Simulation. Cycles id=1000000> <Discard. Traffic id=10000> <Traffic. Pattern id="MODAL" type=”RANDOM”> <Load id=0. 5> <Packet. Length id=2> </Simulation> Network. sgm Router. sgm <Router id="DETAILED" inputs=5 outputs=5 buffer. Size=64 buffer. Control=CT routing. Control="ROUTING_ALG"> <Injector id="INJ"> <Consumerid="CONS"> <Buffer id="BUF 1" type="X+" header. Delay=2> <Buffer id="BUF 2" type="X-" header. Delay=2>. . . <Buffer id="BUF 5" type="Node" header. Delay=2> <Routing id="RTG 1" type="X+" header. Delay=1>. . . <Routing id="RTG 5" type="Node" header. Delay=1> <Crossbar id="XBAR" inputs="5" outputs="5" type="CT"> <Input id=1 type="X+">. . . <Output id=5 type="Node"> </Crossbar> <Connection id="C 01" source="INJ" destination="BUF 5">. . . <Connection id="C 20" source="RTG. 1" destination="XBAR. 1">. . . </Router> <Torus. Network id="TORUS" size. X=8 size. Y=8 router="DETAILED" delay=1> <Mesh. Network id="MESH" size. X=8 size. Y=8 router="DETAILED" delay=1> TOPAZ@NOCS 12 6

Standalone 1 0. 8 0. 6 0. 4 0. 2 0 0 Total Latency

Standalone 1 0. 8 0. 6 0. 4 0. 2 0 0 Total Latency (cycles) 0. 11 RR ABR VCR 0. 1 0. 09 0. 5 Applied Load (flits/cycle/router) 1 500 400 300 200 100 0 Traffic fraction (%) Accepted Load (flit/cyc/rter) Using TOPAZ (Printing) Link Utiliz 0. 08 6 Turns 0. 07 0. 6 0. 5 1 Turn 0. 06 0. 05 0. 4 0. 04 0 0. 03 0. 02 0. 01 0 0. 5 Applied Load (flits/cycle/router) 0. 3 2 3 X Position 0 1 1 0 0. 2 4 5 0. 1 6 7 0 50 100 150 200 250 300 350 400 450 500 Network Latency (cycles) Throughput/Latency curves Latency Histogram Injection/Consumption/Link map + Orion + Gems (or Gem 5) Throughput 1 Integer Sort 0. 8 Link Crossbar Buffer Arbiter 0. 6 0. 4 0. 2 0 0 100 200 300 Cycles simulated 400 Throughput/Latency evolution TOPAZ@NOCS 12 500 Power Breakdown 7

Outline • Simulator Description • Out-of-the-Box • Utilization Examples • Support & Collaboration TOPAZ@NOCS

Outline • Simulator Description • Out-of-the-Box • Utilization Examples • Support & Collaboration TOPAZ@NOCS 12 8

Out of the Box 1. Configuration Parameters Router Buffer Size Flow Control Topology Virtual

Out of the Box 1. Configuration Parameters Router Buffer Size Flow Control Topology Virtual Cut Through Ring Buffer Delay Packet Size Traffic Random Bit-Reversal Bubble Flow Control Mesh (2 D & 3 D) # Virtual Channels Perfect-Shuffle Transpose Matrix Torus (2 D & 3 D) #Physical networks Tornado Wormhole Message Types Router Pipeline Link Delay TOPAZ@NOCS 12 Midimew (2 D) Virtual Channel flow Control Hot-Spot Local Square Midimew (2 D) Trace-Based 9

Out of the Box 2. Available Routers Router REF Year Level of Detail Adaptive

Out of the Box 2. Available Routers Router REF Year Level of Detail Adaptive Bubble Router [14] 2001 Complex & simple Deterministic Bubble Router [15] 1998 Complex & simple Deterministic with VC (Dally) [16][17] 2001 Complex & simple VCTM (Dally + MC Support) [18] 2008 Complex & simple Pipeline Optimized [24] 2008 Complex & simple Rotary Router [19] 2007 Complex Bufferless Router [21] 2010 Simple Bidirectional Router [22] 2009 Simple Buffered Crossbar [23] 1987 Complex TOPAZ@NOCS 12 10

Out of the Box 3. Integration with Full-System Simulation Tools Opal (processor) Topaz( Network)

Out of the Box 3. Integration with Full-System Simulation Tools Opal (processor) Topaz( Network) Simics Ruby (Memory) M 5 (processor) Wisconsin Multifacet Gems: http: //research. cs. wisc. edu/gems/ Gem 5 simulator system: http: //gem 5. org/Main_Page TOPAZ@NOCS 12 11

Outline • Simulator Description • Out-of-the-Box • Utilization Examples • Support & Collaboration TOPAZ@NOCS

Outline • Simulator Description • Out-of-the-Box • Utilization Examples • Support & Collaboration TOPAZ@NOCS 12 12

Increasing Full-System simulation accuracy Main System Parameters System Network Cores 16 Cores, @4 GHz,

Increasing Full-System simulation accuracy Main System Parameters System Network Cores 16 Cores, @4 GHz, OOO, 4 -wide issue, 64 -entry IW, 16 outstanding Mem. Req L 2 16 MB, SNUCA, Token(B) coherence protocol, 6 msg. dependence chain Topology 4 x 4 Mesh L 1 Independent I/D caches, 32 KB, 4 -way, 1 cycles L 2 Bank 1 MB, 16 -way, 5 cycles, pseudo LRU Links 1 cycle, 128 bits wide Memory 4 GB, 320 GB/s, 260 cycles OS Solaris 10 Broadcast Coherence Protocol (Execution Time) RUBY Normalized Execution Time 2. 5 RUBY 2 TOPAZ_SIMPLE 1. 5 1 0. 5 TOPAZ@NOCS 12 13 N M EA clu st er e at am re ni m st flu id a nn e al s le ca us sc ho ck Ze bl a TP OL B JB he A Ap ac UA _ _A SP G_ W M _A LU A IS _ _W FT CG _A _A BT pp et m lb m om er m hm as t ar 0

TOPAZ@NOCS 12 e 14 M EA N er st clu am at ni m

TOPAZ@NOCS 12 e 14 M EA N er st clu am at ni m al at e N EA M clu st er am re nn ea l s ho le ni m id a flu ca sc us TP Ze OL B JB e ac h _A UA _A SP W M G_ LU _A IS _A Ap ck bl a st More Accuracy => Slower Execution Time simulations On average, Ruby is ≈ 2 X faster re st id a flu nn e ca s le us Ze TP OL B JB sc ho ck bl a A he Ap ac UA _ _A SP G_ W M _A LU A _W FT _A CG A BT _ p TOPAZ_SIMPLE IS _ _W FT CG _A _A m lb et p m om RUBY BT pp et 2 m 2. 5 om er hm m ta r as Normalized Cycles Simualted/seccond 1. 2 lb m er ar m hm as t RUBY Normalized Execution Time Increasing Full-System simulation accuracy Simulation speed (cycles/second) 1 TOPAZ_COMPLEX 0. 8 0. 6 0. 4 0. 2 0 RUBY 1. 5 TOPAZ_SIMPLE 1 0. 5 0

TOPAZ@NOCS 12 e 15 M EA N er st clu am at ni m

TOPAZ@NOCS 12 e 15 M EA N er st clu am at ni m 400 re st al nn ea l s ho le us TP Ze N EA M clu st er at e ni m am re st id a flu ca sc ck bl a OL B JB (AI)TOPAZ_SIMPLE id a flu es ho l nn e ca sc us Ze M Cycles simulated ck RUBY Execution Time 200 300 bl a TP OL B e TOPAZ_COMPLEX JB ac h _A UA _A SP W M G_ Ap 0. 4 he Ap ac A UA _ _A SP RUBY TOPAZ_SIMPLE TOPAZ_COMPLEX (AI)TOPAZ_SIMPLE G_ W LU _A IS _A TOPAZ_SIMPLE M _A 100 LU 0 A 0. 2 IS _ 0 _W 0. 4 FT 0. 6 _W 0. 8 _A CG A RUBY FT CG _A 1 BT _ 0. 6 _A 0. 8 BT 1. 5 RUBY TOPAZ 0 pp 1 et 2 m 2. 5 lb m 0. 2 er Throughput lb m om m et pp hm m ta r as Normalized Cycles Simualted/seccond 1. 2 om er m hm ar as t RUBY Normalized Execution Time Improving Simulation Speed (I) Simulation speed (cycles/second) 1 (AI)TOPAZ_COMPLEX Adaptive Interface Integer Sort 500 0. 5 0

TOPAZ@NOCS 12 16 M EA N er e ea l s N EA M

TOPAZ@NOCS 12 16 M EA N er e ea l s N EA M clu st er at e ni m am re st id a flu nn ca us Ze ho le sc ck bl a TP (P)TOPAZ_COMPLEX st clu at al nn e es ho l ni m am re st id a flu ca sc ck us Ze OL B JB (AI)TOPAZ_COMPLEX bl a TP OL B e ac h Ap _A T 1 JB he Ap ac A UA _A SP W G_ M LU _A IS _A (AI)TOPAZ_SIMPLE UA _ _A SP G_ W M _A LU A TOPAZ_COMPLEX IS _ _W 0 FT 0. 2 _W 0. 4 FT 0. 6 _A 0. 8 CG A BT _ 1 CG _A _A p m et p m lb TOPAZ_SIMPLE BT pp 1. 5 et 2 m 2. 5 om om er ta r m hm as Normalized Cycles Simualted/seccond 1. 4 1. 2 RUBY lb m er m hm ar as t RUBY Normalized Execution Time Improving Simulation Speed (II) Simulation speed (cycles/second) (P)TOPAZ_SIMPLE 2 -Thread Simulation T 2 Execution Time RUBY TOPAZ_SIMPLE TOPAZ_COMPLEX (AI)TOPAZ_SIMPLE 1 0. 5 0

Simulating thousand-node Networks 12 -Core ( Xeon E 5645) server with 54 GBytes of

Simulating thousand-node Networks 12 -Core ( Xeon E 5645) server with 54 GBytes of main memory. 6 E+05 Simulation Time (Seconds) 5 E+05 32 K Rotuers 128 K Routers 4 E+05 256 K Routers 3 E+05 512 K Routers 1 M Routers 2 E+05 49 GB 1 E+05 24 GB 12 GB 5. 5 GB 1. 5 GB 0 E+00 1 3 5 7 Number of Cores 9 11 • 3 D Torus, Bubble Router (simple), similar to IBM Blue Gene. • Multithreaded implementation takes advantage of multicore server • Good speedup for 1 Million routers TOPAZ@NOCS 12 17

Outline • Simulator Description • Out-of-the-Box • Utilization Examples • Support & Collaboration TOPAZ@NOCS

Outline • Simulator Description • Out-of-the-Box • Utilization Examples • Support & Collaboration TOPAZ@NOCS 12 18

Support & Collaboration code. google. com/p/tpzsimul TOPAZ@NOCS 12 19

Support & Collaboration code. google. com/p/tpzsimul TOPAZ@NOCS 12 19

Support & Collaboration TOPAZ@NOCS 12 20

Support & Collaboration TOPAZ@NOCS 12 20

http: //www. atc. unican. es/galerna/index. html Thanks for your attention Questions? TOPAZ@NOCS 12 21

http: //www. atc. unican. es/galerna/index. html Thanks for your attention Questions? TOPAZ@NOCS 12 21

T 1 MRR@HPCA 09 T 2 T 3 T 4 22

T 1 MRR@HPCA 09 T 2 T 3 T 4 22

Using TOPAZ BUILDING TOPAZ@NOCS 12 RUNNING PRINTING 23

Using TOPAZ BUILDING TOPAZ@NOCS 12 RUNNING PRINTING 23

Using TOPAZ (Building) Router. sgm -Router & Crossbar Ports - Buffer Size - Routing

Using TOPAZ (Building) Router. sgm -Router & Crossbar Ports - Buffer Size - Routing & Flow Control Policies Network. sgm Crossbar TOPAZ@NOCS 12 24

Using TOPAZ (Building) Router. sgm -Router & Crossbar Ports - Buffer Size - Routing

Using TOPAZ (Building) Router. sgm -Router & Crossbar Ports - Buffer Size - Routing & Flow Control Policies Network. sgm -Network Size - Network Topology - Link Delay Simula. sgm -Traffic Pattern - Message Size - Simulation Cycles TOPAZ@NOCS 12 25

Using TOPAZ (Running) No need to re-compile - Only need to add new configutations

Using TOPAZ (Running) No need to re-compile - Only need to add new configutations at sgml files. - Each configuration identified by a tag at Simula. sgm. Option –s at command line to choose a specific configuration. Different Execution Modes: - Run your simulation for XXX Cycles. - Run your simulation until YYY Messages reach their destination. Command Line Options: - Many sgml parameters can be overwritten through command line options. - Example: - Useful for scripting. TOPAZ@NOCS 12 26