Open So C Fabric An open source parameterized
Open. So. C Fabric An open source, parameterized, network generation tool Farzad Fatollahi-Fard, David Donofrio, George Michelogiannakis, John Shalf Co. DEx So. C for HPC Programmatic Meeting August 25 -26, 2014. Denver, CO. 1
Motivation Power, parallelism, and data movement drive the need for So. C State of the Art in So. C What technology is being used to build So. Cs? What can we leverage? Open. So. C Overview An open source network generator using a new HDL - Chisel Open. So. C Code Deep Dive and Demo Open. So. C module walk through and network output Conclusion and Future Work New features for Open. So. C 1 2 3 4 5 2
Power: The New Design Constraint Trends beginning in 2004 are continuing… ‣ Power densities have ceased to increase ‣ No power efficiency increase with smaller transistors 3
Power: The New Design Constraint On-chip parallelism increasing to maintain performance increases… ‣ We have come to the end of clock frequency scaling ‣ Moore’s Law is alive and well • Now seeing core count increasing 4
Parallelism increasing NERSC Trends Franklin Hopper Edison Cori (NERSC 8) Core Count 4 24 48 (logical) >60 Clock Rate 2. 3 GHz 2. 1 GHz 2. 4 GHz ~1. 5 GHz Memory 8 GB 32 GB 64 -128 GB +On package Peak Perf . 352 PF 1. 288 PF 2. 57 PF > 3 TF
Hierarchical Power Costs Data movement is the dominant power cost 6 p. J Cost to move data 1 mm on-chip 100 p. J 120 p. J 250 p. J 2000 p. J ~2500 p. J Typical cost of a single floating point operation Cost to move data 20 mm on chip Cost to move off-chip, but stay within the package (SMP) Cost to move data off chip into DRAM Cost to move data off chip to a neighboring node 6
Motivation Power, parallelism, and data movement drive the need for So. C State of the Art in So. C What technology is being used to build So. Cs? What can we leverage? Open. So. C Overview An open source network generator using a new HDL - Chisel Open. So. C Code Deep Dive and Demo Open. So. C module walk through and network output Conclusion and Future Work New features for Open. So. C 1 2 3 4 5 7
Current Hardware Challenges Embracing embedded designs ‣ Power is now limiting factor for leading-edge chips • Moore’s Law continues… but now reliant to more exotic technologies such as 3 D transistors, Fin. FET etc. • Design Validation / Verification dominating development costs ‣ Solution: Smaller is Better • Simpler, 5 to 9 stage pipeline cores • Parallel is key to energy efficiency: CV 2 F • Large arrays of small simple cores are easier to verify, more resilient ‣ Vibrant commodity market in IP components
Building an So. C from IP Logic Blocks It’s Legos with a some extra integration and verification cost Processor Core (ARM, Tensilica, MIPS deriv) With extra “options” like DP FPU, ECC Open. So. C Fabric (on-chip network) (currently proprietary ARM or Arteris) DRAM PCIe Gen 3 Root complex Integrated FLASH Controller IO 10 Gig. E or IB DDR 4 x Channel IB or Gig. E PCIe Memory DRAM FLASH Control DDR memory controller (Denali/Cadence, Si. Creations) + Phy & Programmable PLL Mem Control
So. C – What’s out there? Some common features ‣ Cost driven • CPUs integrated with IO and Gfx to reduce BOM cost, decrease total PCB area • Die power density / pin constraint driven ‣ Homogeneous cores ‣ Simple Networks • Most So. Cs rely on ring or cross-bar interconnect • Increasing core count will drive need for more complex topologies • KNL present in 2016 NERSC machine will have mesh
What Interconnect Provides the Best Power / Performance Ratio? What tools exist to answer this question? 12
So. C - Interconnect Examples Some common topologies
The Importance of Network Topology Network topology can greatly influence application performance An analysis of on-chip interconnection networks for large-scale chip multiprocessors ACM Transactions on computer architecture and code optimization (TACO), April 2010
The Importance of Networks consume a large fraction of total chip power. . . Clock distribution 10% Dual FPMACs 34% IMEM and DMEM 20% Routers and links 26% 10 -port RF 9% A 5 -GHz Mesh Interconnect for a Teraflops Processor. IEEE Micro. 2007
What tools exist for So. C research What tools do we have to evaluate large, complex networks of cores? ‣ Software models • Fast to create, but plagued by long runtimes as system size increases ‣ Hardware emulation • Fast, accurate evaluate that scales with system size but suffers from long development time A complexity-effective architecture for accelerating fullsystem multiprocessor simulations using FPGAs. FPGA 2008
Software Models C++ based on-chip network simulators ‣ Booksim • Cycle-accurate • Verified against RTL • Few thousand cycles per second Booksim ISPASS 2013 ‣ Garnet • Event driven • Simulation speed limits designs to 100’s of cores GARNET ISPASS 2009 17
Hardware Models HDL network generators and implementations ‣ Stanford opensource No. C router • Verilog • Precise but long simulation times ‣ Connect network generation CONNECT: fast flexible FPGA-tuned networks-on-chip. CARL 2012 • Bluespec • FPGA Optimized 18
Motivation Power, parallelism, and data movement drive the need for So. C State of the Art in So. C What technology is being used to build So. Cs? What can we leverage? Open. So. C Overview An open source network generator using a new HDL - Chisel Open. So. C Code Deep Dive and Demo Open. So. C module walk through and network output Conclusion and Future Work New features for Open. So. C 1 2 3 4 5 19
Chisel: A New Hardware DSL Using Scala to construct Verilog and C++ descriptions ‣ Chisel provides both software and hardware models from the same codebase ‣ Object-oriented hardware development • Allows definition of structs and other highlevel constructs ‣ Powerful libraries and components ready to use ‣ Working processors fabricated using chisel
Recent Chisel Designs Chisel created cores successfully boot Linux Processor Site Clock test DCDC site test site SRAM test site Raven core – 28 nm 21
Chisel Overview How does Chisel work? ‣ Not “Scala to Gates” ‣ Describe hardware functionality ‣ Chisel creates graph representation • Flattened ‣ Each node translated to Verilog or C++ 22
Open. So. C Fabric An open source, flexible, parameterized, No. C generator ‣ Part of the Co. DEx tool suite, written in Chisel ‣ Dimensions, topology, VCs all configurable ‣ Fast functional C++ model for functional validation • System. C ready ‣ Verilog based description for FPGA or ASIC • Synthesis path enables accurate power / energy modeling ‣ AXI Based endpoints • Ready for ARM integration 24
Open. So. C Fabric An open source, flexible, parameterized, No. C generator
Open. So. C: Current Status Projected v 1. 0 release date of October 1 st ‣ Available now: • 2 -D mesh or Flattened Butterfly network of arbitrary size • Wormhole routing ‣ In Development • Virtual Channels • AXI Interface 26
Motivation Power, parallelism, and data movement drive the need for So. C State of the Art in So. C What technology is being used to build So. Cs? What can we leverage? Open. So. C Overview An open source network generator using a new HDL - Chisel Open. So. C Code Deep Dive and Demo Open. So. C module walk through and network output Conclusion and Future Work New features for Open. So. C 1 2 3 4 5 27
Open. So. C Code Examples Walkthrough of Switch and Full Network Tester 28
Motivation Power, parallelism, and data movement drive the need for So. C State of the Art in So. C What technology is being used to build So. Cs? What can we leverage? Open. So. C Overview An open source network generator using a new HDL - Chisel Open. So. C Code Deep Dive and Demo Open. So. C module walk through and network output Conclusion and Future Work New features for Open. So. C 1 2 3 4 5 29
Future additions Towards a full set of features ‣ Photonics and circuit switched networks ‣ Integrated NIC model ‣ More diverse topologies and routing functions ‣ Validation against RTL and other simulators ‣ Standardized (AXI) interfaces at the endpoints ‣ More powerful synthetic traffic and trace replay support ‣ Power modeling in the C++ model 30
Acknowledgements ‣ UCB Chisel ‣ US Dept of Energy ‣ Ke Wen ‣ Columbia LRL ‣ John Bachan ‣ Dan Burke ‣ BWRC 31
More Information http: //opensocfabric. org 32
- Slides: 30