Wave Scalar everyday dataflow Steven Swanson Ken Michelson

  • Slides: 33
Download presentation
Wave. Scalar everyday dataflow Steven Swanson Ken Michelson Andrew Schwerin Mark Oskin University of

Wave. Scalar everyday dataflow Steven Swanson Ken Michelson Andrew Schwerin Mark Oskin University of of Washington Sponsored by NSF and Intel

Things to keep you up at night ~2016 n Opportunities n n 8 billion

Things to keep you up at night ~2016 n Opportunities n n 8 billion transistors; 28 Ghz 4 GB per DRAM chip 120 P 4 s OR 200, 000 RISC-1 per die Challenges n n Communication Defects Complexity Performance University of Washington We should all be going to SIGCOMM 2 Micro 2003

Monolithic von Neumann Processors A phenomenal success today. But in 2016? Communication Broadcast networks

Monolithic von Neumann Processors A phenomenal success today. But in 2016? Communication Broadcast networks Defect tolerance 1 flaw -> paperweight Complexity 40 -60% of design is validation Performance Deeper pipes unlikely (ISCA 02) University of Washington 3 Micro 2003

Decentralized Processors Communication Defect tolerance Complexity ? Performance But how do you execute? University

Decentralized Processors Communication Defect tolerance Complexity ? Performance But how do you execute? University of Washington 4 Micro 2003

Von Neumann is Centralized n PC-driven fetch is the problem n n One program

Von Neumann is Centralized n PC-driven fetch is the problem n n One program counter Dataflow is the solution University of Washington 5 Micro 2003

Dataflow has been done before. . . n n n Operations fire when data

Dataflow has been done before. . . n n n Operations fire when data is available No program counter Convert true control dependences to data dependences Exposes massive parallelism But. . . University of Washington 6 Micro 2003

. . . it had issues n n Scalability Dataflow never executed mainstream code

. . . it had issues n n Scalability Dataflow never executed mainstream code n n No total load-store ordering Special languages Different memory semantics n No mutable data structures (mostly) n Functional (mostly) n University of Washington 7 Micro 2003

The Wave. Scalar ISA n n Wave. Scalar is memory-centric dataflow Compared to von

The Wave. Scalar ISA n n Wave. Scalar is memory-centric dataflow Compared to von Neumann n n There is no fetch Compared to traditional dataflow n n Memory ordering is a first-class citizen Normal memory semantics No need for special languages n We can execute conventional languages, like C n University of Washington 8 Micro 2003

Wave. Scalar example A[j + i*i] = i; b = A[i*j]; University of Washington

Wave. Scalar example A[j + i*i] = i; b = A[i*j]; University of Washington 9 Micro 2003

Wave. Scalar example A i j * * A[j + i*i] = i; +

Wave. Scalar example A i j * * A[j + i*i] = i; + + Load + b = A[i*j]; b University of Washington 10 Store Micro 2003

Wave. Scalar example A i j * * A[j + i*i] = i; +

Wave. Scalar example A i j * * A[j + i*i] = i; + + Load + b = A[i*j]; b University of Washington 11 Store Micro 2003

Wave. Scalar example A i j * * A[j + i*i] = i; +

Wave. Scalar example A i j * * A[j + i*i] = i; + + Load + b = A[i*j]; b University of Washington 12 Store Micro 2003

Wave. Scalar example A i j * * A[j + i*i] = i; +

Wave. Scalar example A i j * * A[j + i*i] = i; + + Load + b = A[i*j]; b University of Washington 13 Store Micro 2003

Wave. Scalar example A i j * * A[j + i*i] = i; +

Wave. Scalar example A i j * * A[j + i*i] = i; + + Load + b = A[i*j]; b University of Washington 14 Store Micro 2003

Wave. Scalar example A i j * * A[j + i*i] = i; +

Wave. Scalar example A i j * * A[j + i*i] = i; + + Load + b = A[i*j]; b University of Washington 15 Store Micro 2003

Wave. Scalar example A i j * * A[j + i*i] = i; b

Wave. Scalar example A i j * * A[j + i*i] = i; b = A[i*j]; + + Load + b University of Washington 16 Store Micro 2003

Wave-ordered memory n Compiler annotates memory operations n Sequence # n Successor n Predecessor

Wave-ordered memory n Compiler annotates memory operations n Sequence # n Successor n Predecessor n n Load 2 3 4 Store 3 4 ? Send memory requests in any order Hardware reconstructs the correct order University of Washington 4 5 6 Store 5 6 8 Load 4 7 8 Store ? 8 9 17 Micro 2003

Wave-ordering Example Load 2 3 4 Store 3 4 ? 4 5 6 Store

Wave-ordering Example Load 2 3 4 Store 3 4 ? 4 5 6 Store 5 6 8 Load Store buffer 2 3 4 ? Load 4 7 8 ? 8 9 Store ? 8 9 University of Washington 18 Micro 2003

Wave-ordered Memory n Waves are loop-free n sections of the dataflow graph Each dynamic

Wave-ordered Memory n Waves are loop-free n sections of the dataflow graph Each dynamic wave has a n Wave-ordered memory wave number n n Wave-numbers Sequence number University of Washington 19 Micro 2003

The Ideal Wave. Scalar Machine n n n An ALU at every static instruction

The Ideal Wave. Scalar Machine n n n An ALU at every static instruction No processor core Instructions communicate directly A * 20 * + + Load + b University of Washington i j Store Micro 2003

The Wave. Cache The I-Cache is the processor. University of Washington 21 Micro 2003

The Wave. Cache The I-Cache is the processor. University of Washington 21 Micro 2003

Processing Element University of Washington 22 Micro 2003

Processing Element University of Washington 22 Micro 2003

Domain University of Washington 23 Micro 2003

Domain University of Washington 23 Micro 2003

Cluster University of Washington 24 Micro 2003

Cluster University of Washington 24 Micro 2003

The Wave. Cache n Long distance communication n n n Dynamic routing Grid-based network

The Wave. Cache n Long distance communication n n n Dynamic routing Grid-based network 1 cycle/cluster Traditional cache coherence Normal memory hierarchy 16 K instructions University of Washington 25 Micro 2003

Demo! University of Washington 26 Micro 2003

Demo! University of Washington 26 Micro 2003

Performance n n Binary translator from Alpha -> Wave. Scalar Baseline n n n

Performance n n Binary translator from Alpha -> Wave. Scalar Baseline n n n Compare to a very aggressive superscalar n n n ~2000 Processing elements No speculation 15 -stage, 16 -wide 1024 - registers, 1024 -entry issue queue Measure performance in Alpha-equivalent instructions per cycle University of Washington 27 Micro 2003

Wave. Cache Performance University of Washington 28 Micro 2003

Wave. Cache Performance University of Washington 28 Micro 2003

Decentralized Processing Communication Defect tolerance Complexity Performance University of Washington 29 Micro 2003

Decentralized Processing Communication Defect tolerance Complexity Performance University of Washington 29 Micro 2003

Dataflow vs. von Neumann University of Washington 30 Micro 2003

Dataflow vs. von Neumann University of Washington 30 Micro 2003

Dataflow vs. von Neumann University of Washington 31 Micro 2003

Dataflow vs. von Neumann University of Washington 31 Micro 2003

Future work n n n Beyond von Neumann emulation Compiler Instruction Placement Operating system

Future work n n n Beyond von Neumann emulation Compiler Instruction Placement Operating system Fault tolerance System integration/code migration? University of Washington 32 Micro 2003

Conclusions n n n Decentralized computing will let you rest easy in 2016 Wave.

Conclusions n n n Decentralized computing will let you rest easy in 2016 Wave. Scalar: Dataflow with normal memory!!! Wave. Cache n n n “The I-Cache is the processor. ” Outperforms an OOO superscalar by 2. 8 x Enormous opportunities for future research n Download at: http: //wavescalar. cs. washington. edu University of Washington 33 Micro 2003