ESL System Level Design Bluespec ESEPro ESL Synthesis
ESL: System Level Design Bluespec ESEPro: ESL Synthesis Extenstions for System. C Rishiyur S. Nikhil CTO, Bluespec, Inc. (www. bluespec. com) 6. 375 Lecture 16 Delivered by Arvind March 16, 2007 (Only a subset of Nikhil’s slides are included)
The central ESL design problem Early software Early models Software HW/SW interface (e. g. , register read/write) implements Avail. early; First HW model(s) very fast sim; not HW-accurate (timing, area) r efin e res explore architectu power). a re a , d e e p s r o (f Required: uniform computational model (single paradigm), plus higher level than RTL, even for implementation Rishiyur Nikhil, Bluespec, Inc. Not avail. early; HW Implementation slower sim; HW-accurate 2
Another ESL design problem Reuse (models and implementations) So. C 1 So. C 2 So. C n Required: powerful parameterization and powerful module interface semantics Rishiyur Nikhil, Bluespec, Inc. 3
Bluespec enables ESL • • • Rules and Rule-based Interfaces provide a uniform computational model suitable both for high-level system modeling as well as for HW implementation • Atomic Transaction semantics are very powerful for expressing complex concurrency – Formal and informal reasoning about correctness – Automatic synthesis of complex control logic to manage concurrency • Map naturally to HW (“state machines); synthesizable; no mental shifting of gears during refinement Can be mixed with regular System. C, TLM, and C++, for mixedmodel and whole-system modeling Enables Design-by-Refinement; Design-by-Contract BSV: Bluespec System. Verilog ESEPro: Bluspec’s ESL Synthesis Extensions to System. C Rishiyur Nikhil, Bluespec, Inc. 4
Rule Concurrent Semantics • “Untimed” semantics: Forever: Execute any enabled rule • “Timed”, or “Clock Scheduled” semantics (Bluespec scheduling technology) In each clock: Execute a subset of enabled rules (in parallel, but consistent with untimed semantics) Rishiyur Nikhil, Bluespec, Inc. 5
Bluespec Tools Architecture ESEPro (System. C*) ESE and ESEPro BSV (System. Verilog*) ESEComp and BSC Parsing systemc. h, esl. h Bluesim Parsing Optimization gcc Untimed & Timed Scheduling Common Synthesis Engine Power Optimization . exe Blueview Debug Static Checking Rapid, Source-Level Simulation and Interactive Debug of BSV RTL Generation sim synthesis Cycle-Accurate w/Verilog sim sim RTL Rishiyur Nikhil, Bluespec, Inc. 6
Outline • • • Limitations of System. C in modeling So. Cs ESEPro’s Rule-based Interfaces Model-to-implementation refinement with System. C and ESEPro modules • Seamless interoperation of System. C TLM and ESEPro modules • ESEPro-to-RTL synthesis • An example Rishiyur Nikhil, Bluespec, Inc. 7
Example illustrating why modeling hardware-accurate complex concurrency is difficult in standard System. C (threads and events) Rishiyur Nikhil, Bluespec, Inc. 8
• Certain “interesting packets” must be counted Determine Queue Spec: • Packets arrive on two input FIFOs, and must be switched to two output FIFOs Determine Queue A 2 x 2 switch, with stats +1 Count certain packets Rishiyur Nikhil, Bluespec, Inc. 9
void thread 2 () {while (true) { Pkt x = in 1 ->first(); in 1 ->deq(); if (x. dest == 0) out 0 ->enq (x); else out 1 ->enq (x); if (count(x)) c++; } } first(), deq() block if input fifo is empty; enq() blocks if output fifo is full. It all works fine because of “cooperative parallelism” Rishiyur Nikhil, Bluespec, Inc. Determine Queue void thread 1 () {while (true) { Pkt x = in 0 ->first(); in 0 ->deq(); if (x. dest == 0) out 0 ->enq (x); else out 1 ->enq (x); if (count(x)) c++; } } Determine Queue The first version of the System. C code is easy +1 Count certain packets 10
Cooperative parallelism model • The two increments to the counter do not need to be protected with “locks” because of System. C’s definition of parallelism as cooperative, i. e. , • Threads only switch at “wait()” statements • Threads do not interleave • But real hardware has real parallelism! • Gap between model and implementation • Further, cooperative multithreading also makes it hard to simulate models in parallel (e. g. , on a modern multi-core or SMP machine) This code would have problems with preemptive parallelism Rishiyur Nikhil, Bluespec, Inc. 11
void thread 2 () {while (true) { int tmp = c ; Pkt x = in 1 ->first(); in 1 ->deq(); if (x. dest == 0) out 0 ->enq (x); else out 1 ->enq (x); if (count(x)) c = tmp + 1 ; } } If the threads interleave due to blocking of first(), deq(), enq(), c will be incorrectly updated (non-atomically) Rishiyur Nikhil, Bluespec, Inc. Determine Queue void thread 1 () {while (true) { int tmp = c ; Pkt x = in 0 ->first(); in 0 ->deq(); if (x. dest == 0) out 0 ->enq (x); else out 1 ->enq (x); if (count(x)) c = tmp + 1 ; } } Determine Queue There could be some subtle mistakes +1 Count certain packets Cooperative parallelism Atomicity 12
Each output fifo can be enq’d by only one process at a time (in the same clock) • Need arbitration if both processes want to enq() on the same fifo simultaneously • System. C’s cooperative multitasking makes it easy to ignore this, but much harder to model this accurately Determine Queue • Determine Queue Hardware has additional “resource contention” constraints +1 Accurately modeling this makes the code messier Rishiyur Nikhil, Bluespec, Inc. Count certain packets 13
The counter can be incremented by only one process at a time • Need arbitration if both want to increment • System. C’s cooperative multitasking makes it easy to ignore this, but much harder to model this accurately Determine Queue • Determine Queue Hardware has additional “resource contention” constraints +1 Accurately modeling this makes the code messier Rishiyur Nikhil, Bluespec, Inc. Count certain packets 14
No intermediate buffering a process should transfer a packet only when both its input fifo and its output fifo are ready, and it has priority on its output fifo and the counter • System. C’s blocking methods make it easy to ignore this, but much harder to model this accurately Determine Queue • Determine Queue Hardware has additional “resource contention” constraints +1 Accurately modeling this makes the code messier Rishiyur Nikhil, Bluespec, Inc. Count certain packets 15
Hardware typically has additional “resource contention” constraints • These constraints must be modeled in order to model HW performance accurately (latencies, bandwidth) • In System. C, this exposes full/empty tests on fifos, adds locks/semaphores, polling of locks/semaphores, … • The code becomes a mess • If we want synthesizability, it more and more resembles writing RTL in System. C notation Rishiyur Nikhil, Bluespec, Inc. 16
Limitations of System. C/C++ • Accurate So. C modeling involves lots of concurrency and dynamic, fine-grain resource sharing • Because these are the characteristics of HW • Most blocks in an So. C are HW; a few blocks (e. g. , processor, DSP) involve software (typically C, C++) • “Threads and Events” (System. C’s concurrency model) are far too low-level for this • Require tedious, explicit management of concurrent access to shared state • Weak semantics for module composition • Does not scale to large systems • They are the source of the majority of bugs in RTL and System. C (race conditions, inconsistent state, protocol errors, …) • Instead, advanced SW systems (e. g. , Operating Systems, Database Systems, Transaction Processing Systems) use Atomic Transactions to manage complex concurrency Rishiyur Nikhil, Bluespec, Inc. 17
Other issues with System. C/C++ • No early feedback on HW implementability during modeling, because of distance of System. C semantics from HW • Threads, stacks, dynamic allocation, events, locks, global variables, undisciplined instantaneous access to global/remote data • Undisciplined access to shared resources • No credible synthesis from a sequential, threadbased model of computation (except for loop-andarray computational kernels) • The design has to be re-implemented in RTL Rishiyur Nikhil, Bluespec, Inc. 18
Literature on problems with threads (and the advantages of atomicity) • The Problem with Threads, Edward A. Lee, IEEE Computer, 39: 5, pp 33 -42, May 2006 • Why threads are a bad idea (for most purposes), John K. Ousterhout, Invited Talk, USENIX Technical Conference, January 1996 • Composable memory transactions, Tim Harris, Simon Marlow, Simon Peyton Jones and Maurice Herlihy, in ACM Conf. on Principles and Practice of Parallel Programming (PPo. PP), 2005. • Atomic Transactions, Nancy A. Lynch, Michael Merritt, William E. Weihl and Alan Fekete, Morgan Kaufman, San Mateo, CA, 1994, 476 pp. • … and more … Rishiyur Nikhil, Bluespec, Inc. 19
ESL_RULE (r 1); Pkt x = in 1 ->first(); in 1 ->deq(); if (x. dest == 0) out 0 ->enq(x); else out 1 ->enq(x); if (count(x)) c++; } Determine Queue ESL_RULE (r 0); Pkt x = in 0 ->first(); in 0 ->deq(); if (x. dest == 0) out 0 ->enq(x); else out 1 ->enq(x); if (count(x)) c++; } Determine Queue 2 x 2 switch: the meat of the ESEPro code Atomicity of rules captures all the “resource contention” constraints of hardware Count certain packets implementation; further, this code is synthesizable to RTL as written. Rishiyur Nikhil, Bluespec, Inc. +1 20
Managing change • Specs always change. Imagine: • Some packets are multicast (go to both FIFOs) • Some packets are dropped (go to no FIFO) • More complex arbitration – FIFO collision: in favor of r 1 – Counter collision: in favor of r 2 – Fair scheduling • Several counters for several kinds of interesting packets • Non exclusive counters (e. g. , TCP IP) • M input FIFOs, N output FIFOs (parameterized) • What if these changes are required 6 months after original coding? With Rules these are easy, because the source code remains uncluttered by all the complex control and mux logic atomicity ensures correctness Rishiyur Nikhil, Bluespec, Inc. 21
Outline • • • Limitations of System. C in modeling So. Cs ESEPro’s Rule-based Interfaces Model-to-implementation refinement with System. C and ESEPro modules • Seamless interoperation of System. C TLM and ESEPro modules • ESEPro-to-RTL synthesis • An example Rishiyur Nikhil, Bluespec, Inc. 22
Interfaces: raising the level of abstraction (while preserving Rule semantics) • Interfaces can also contain other interfaces • We use this to build a hierarchy of interfaces • Get/Put Client/Server … • These capture common interface design patterns • There is no HW overhead to such abstraction • Connections between standard interfaces can be packaged (and used, and reused) • “Connectable” interfaces • All these are synthesizable Rishiyur Nikhil, Bluespec, Inc. 23
Get and Put Interfaces • • Provide simple methods for getting data from a module or putting data into it Easy to connect together template <typename T> ESL_INTERFACE ( Get ) { ESL_METHOD_ACTIONVALUE_INTERFACE ( get, T ); } Rishiyur Nikhil, Bluespec, Inc. put get template <typename T> ESL_INTERFACE ( Put ) { ESL_METHOD_ACTION_INTERFACE ( put, T x ); } 24
Get and Put Interfaces • Get and Put are just interface specifications • Many different kinds of modules can provide Get and Put interfaces • E. g. , a FIFO’s enq() can be viewed as a put() operation, and a FIFO’s first()/deq() can be viewed as a get() operation Rishiyur Nikhil, Bluespec, Inc. 25
Interface transformers/transactors • Because of the abstractions of interfaces and modules, it is easy to write interface transformers/transactors • This example is from the ESEPro library, transforming a FIFO interface into a Get interface ESL_MODULE_TEMPLATE ( fifo. To. Get, T ) { FIFO<T> *f; ESL_METHOD_ACTIONVALUE (get, true, T) { T temp = f->first(); f->deq(); return temp; } ESL_CTOR ( fifo. To. Get, FIFO<T> *ff ) : f ( ff ) { ESL_END_CTOR; } }; Rishiyur Nikhil, Bluespec, Inc. 26
Interface transformers/transactors • Another example from the ESEPro library, transforming a FIFO interface into a Put interface: ESL_MODULE_TEMPLATE ( fifo. To. Put, T ) { FIFO<T> *f; ESL_METHOD_ACTION (put, true, T x) { f->enq (x); } ESL_CTOR ( fifo. To. Put, FIFO<T> *ff ) : f ( ff ) { ESL_END_CTOR; } }; Rishiyur Nikhil, Bluespec, Inc. 27
Nested interfaces • An interface can not only contain methods, but also nested interfaces get put template < typename Req_t, typename Resp_t > ESL_INTERFACE ( Server ) { ESL_SUBINTERFACE ( request, Put <Req_t> ); ESL_SUBINTERFACE ( response, Get <Resp_t> ); } Rishiyur Nikhil, Bluespec, Inc. 28
Sub-interfaces: using transformers • The ESEPro library provides functions to convert FIFOs to Get/Put typedef Server<Req_t, Resp_t> Cache. Ifc; ESL_MODULE ( mk. Cache, Cache. Ifc ) { FIFO<Req_t> *p 2 c; FIFO<Resp_t> *c 2 p; … rules expressing cache logic … put mk. Cache Rishiyur Nikhil, Bluespec, Inc. get ESL_CTOR ( mk. Cache, …) { request = new fifo. To. Put (“req”, p 2 c); response = new fifo. To. Get (“rsp”, c 2 p); } } Absolutely no difference in the HW! 29
Client/Server interfaces • Get/Put pairs are very common, and duals of each other, so the library defines Client/Server interface types for this purpose client get ESL_INTERFACE ( Client<req_t, resp_t> ) { ESL_SUBINTERFACE (request, Get<req_t> ); ESL_SUBINTERFACE (response, Put<resp_t> ); }; put req_t resp_t put get ESL_INTERFACE ( Server<req_t, resp_t> ) { ESL_SUBINTERFACE ( request, Put<req_t> ); ESL_SUBINTERFACE ( response, Get<resp_t> ); }; server Rishiyur Nikhil, Bluespec, Inc. 30
Client/Server interfaces mk. Processor client get put get server ESL_INTERFACE ( Cache. Ifc ) { ESL_SUBINTERFACE ( ipc, Server<Req_t, Resp_t> ); ESL_SUBINTERFACE ( icm, Client<Req_t, Resp_t> ); }; ESL_MODULE ( mk. Cache, Cache. Ifc ) { // from / to processor FIFO<Req_t> *p 2 c; FIFO<Resp_t> *c 2 p; mk. Cache // to / from memory FIFO<Req_t> *c 2 m; FIFO<Resp_t> *m 2 c; client get put … rules expressing cache logic … put get server ESL_CTOR (mk. Cache ) { … ipc = fifos. To. Server (p 2 c, c 2 p); icm = fifos. To. Client (c 2 m, m 2 c); ESL_END_CTOR; mk. Mem Rishiyur Nikhil, Bluespec, Inc. } 31
Connecting Get and Put • A module m 1 providing a Get interface can be connected to a module m 2 providing a Put interface with a simple rule ESL_MODULE ( mk. Top, …) { Get<int> *m 1; Put<int> *m 2; ESL_RULE ( connect, true ) { x = m 1 ->get(); m 2 ->put (x); } // note implicit conditions Rishiyur Nikhil, Bluespec, Inc. put get } 32
“Connectable” interface pairs • There are many pairs of types that are duals of each other • Get/Put, Client/Server, Your. Type. T 1/Your. Type. T 2, … • The ESEPro library defines an overloaded, templated module mk. Connection which encapsulates the connections between such duals • The ESEPro library predefines the implementation of mk. Connection for Get/Put, Client/Server, etc. • Because overloading in C++ is extensible, you can overload mk. Connection to work on your own interface types T 1 and T 2 Rishiyur Nikhil, Bluespec, Inc. 33
mk. Connection • Using these interface facilities, assembling systems becomes very easy mk. Processor client get put server (ipc) ESL_MODULE ( mk. Top. Level, …) { // instantiate subsystems Client<Req_t, Resp_t> *p; Cache_Ifc<Req_t, Resp_t> *c; Server<Req_t, Resp_t> *m; // instantiate connections new mk. Connection< Client<Req_t, Resp_t>, Server<Req_t, Resp_t> > (“p 2 c”, p, c->ipc); mk. Cache client (icm) get put new mk. Connection< Client<Req_t, Resp_t>, Server<Req_t, Resp_t> > (“c 2 m”, c->icm, m); get server mk. Mem Rishiyur Nikhil, Bluespec, Inc. } 34
Outline • • • Limitations of System. C in modeling So. Cs ESEPro’s Rule-based Interfaces Model-to-implementation refinement with System. C and ESEPro modules • Seamless interoperation of System. C TLM and ESEPro modules • ESEPro-to-RTL synthesis • An example Rishiyur Nikhil, Bluespec, Inc. 35
Rules and Levels of abstraction AL/FL (Algorithm/Function level) PV (Programmer’s View) PVT (PV with Timing) AV (Architect’s View) CA (Cycle-accurate) Rules, C, C++, Matlab, … Untimed Rules (no clocks) Clocked Rules (scheduled) IM (Implementation) Rishiyur Nikhil, Bluespec, Inc. 36
Module structure • So. C Model Processor (App/ISS) • DSP (App/ISS) L 2 cache Interconnect Codec model DMA Mem Controller DRAM model Legend • A system model can contain a mixture of System. C modules and ESEPro modules Typical System. C modules: • CPU ISS models • Behavioral models • C++ code targeted for behavioral synthesis • Existing System. C IP Typical ESEPro modules: • Complex control • Requiring HW-realistic architectural exploration Rule-based System. C Rishiyur Nikhil, Bluespec, Inc. 37
Simulation flow System Model Processor (ISS/App) DSP (ISS/App) L 2 cache Interconnect Codec model DMA Mem Controller DRAM model core System. C class defs/libs + TLM + ESL TLM class defs/libs ESL class defs/libs Legend Rule-based System. C Rishiyur Nikhil, Bluespec, Inc. Standard System. C tools (gcc, OSCI sim, gdb, …) 38
Synthesis flow System Model Processor (ISS) DSP (App) L 2 cache Interconnect Codec model DMA Legend Rule-based System. C Rishiyur Nikhil, Bluespec, Inc. Synthesizable subset: ESEPro Rule-based modules § much higher level than RTL § already validated in BSV Mem Controller Bluespec synthesis tool DRAM model RTL synthesis, Physical design Verilog sim Tapeout 39
System refinement Using ESEPro • ESEPro modules can be introduced early as they can be written at a very high level, can interface to TLM modules, and can themselves be refined • System-level testbenches can be reused at all levels • System. C modules with standard TLM interfaces interoperate seamlessly with ESEPro modules • Behavioral models, Design IP, Verification IP, … More information at: http: //www. bluespec. com/products/ESLSynthesis. Extensions. htm Website also has a free distribution called “ESE” Rishiyur Nikhil, Bluespec, Inc. 40
Mixing models: all combinations TLM Master TLM Slave Replace Slave TLM Master Replace Master ESEPro Slave ESEPro Master Replace Slave ESEPro Master Legend Rule-based System. C Rishiyur Nikhil, Bluespec, Inc. TLM Slave ESEPro Slave TLM Master and Slave are taken unmodified from OSCI TLM distribution examples 41
Structure of TLM modules in demo (from OSCI_TLM/examples/example_3_2) TLM master TLM slave 20 write (addr, data) 20 read (addr, data &) basic_initiator_port RSP = transport (REQ) basic_slave_base transport () is a basic TLM interface call Rishiyur Nikhil, Bluespec, Inc. 42
TLM master and ESEPro slave TLM master ESEPro slave 20 write (addr, data) 20 read (addr, data &) Server <REQ, RSP> basic_initiator_port transport () Rishiyur Nikhil, Bluespec, Inc. mk. Connection (channel) 43
Example: ESEPro So. C model for synthesis (from ST/Green. So. Cs “TAC” model) Initiator 0 M Initiator 1 M Set timer M S Respond to timer interrupt S Router M M M S S S Target 0 M = Master interface Rishiyur Nikhil, Bluespec, Inc. Target 1 S = Slave interface Timer (< 1000 lines of source code) 44
So. C Model: Behavior • Initiators repeatedly do read/write transactions to Targets, via Router • At startup, Initiator 1 writes to Timer registers via Router, starting the timer When Timer’s time period expires, generates an interrupt to Initiator 1 • Rishiyur Nikhil, Bluespec, Inc. 45
Synthesis example So. C Model in ESEPro (from ST/Green. Socs “TAC” model) Simulation Synthesis ESEPro™ ESEComp™ Bluespec synthesis tool + ESL core System. C class defs/libs ESL class defs/libs Standard System. C tools (gcc, OSCI sim, gdb, …) This capability is unique to ESEComp Rishiyur Nikhil, Bluespec, Inc. Cycle Accurate RTL Verilog sim Magma synthesis 46
Side-by-side simulation comparison System. C Verilog (Generated) Simulation Cycle 12 Target[1]: got request from initiator[1], addr is 1001 Target[1]: sending response, data 1011 Target[0]: got request from initiator[0], addr is 4 Target[0]: sending response, data 14 Initiator_with_intr_in[1]: forwarding req, addr = 1003 Initiator[0]: got response addr 2, data 12 Initiator[0]: sending req, addr = 6 --------Cycle 13 Timer: generating interrupt Initiator[1]: sending req, addr = 4 --------Cycle 14 Target[1]: got request from initiator[0], addr is 1005 Target[1]: sending response, data 1015 Target[0]: got request from initiator[1], addr is 2 Target[0]: sending response, data 12 Initiator_with_intr_in[1]: forwarding req, addr = 4 Initiator[1]: got response addr 0, data 10 Initiator[0]: got response addr 1003, data 1013 Initiator[0]: sending req, addr = 1007 --------Cycle 15 Initiator_with_intr_in[1] received interrupt Initiator[1]: sending req, addr = 1005 Simulation Cycle Accurate Cycle 12 Initiator[0]: sending req, addr = 6 Initiator[0]: got response addr 2, data 12 Target[0]: got request from initiator[0], addr is 4 Target[0]: sending response, data 14 Target[1]: got request from initiator[1], addr is 1001 Target[1]: sending response, data 1011 Initiator_with_intr_in[1]: forwarding req, addr = 1003 --------Cycle 13 Initiator[1]: sending req, addr = 4 Timer: generating interrupt --------Cycle 14 Initiator[0]: sending req, addr = 1007 Initiator[0]: got response addr 1003, data 1013 Initiator[1]: got response addr 0, data 10 Target[0]: got request from initiator[1], addr is 2 Target[0]: sending response, data 12 Target[1]: got request from initiator[0], addr is 1005 Target[1]: sending response, data 1015 Initiator_with_intr_in[1]: forwarding req, addr = 4 --------Cycle 15 Initiator[1]: sending req, addr = 1005 Initiator_with_intr_in[1] received interrupt (order of messages within each cycle varies, but that’s ok—from parallel actions) Rishiyur Nikhil, Bluespec, Inc. 47
So. C Router: Magma Synthesis Results • ESEComp’s Verilog output run through Magma’s synthesis tools • TSMC 0. 18 µm libraries • Design easily meets 400 MHz Thanks Rishiyur Nikhil, Bluespec, Inc. 48
- Slides: 48